Course Outline
- Section 1: Introduction to Big Data & NoSQL
- Big Data ecosystem
- NoSQL overview
- CAP theorem
- When is NoSQL appropriate
- Columnar storage
- HBase and NoSQL
- Section 2 : HBase Intro
- Concepts and Design
- Architecture (HMaster and Region Server)
- Data integrity
- HBase ecosystem
- Lab : Exploring HBase
- Section 3 : HBase Data model
- Namespaces, Tables and Regions
- Rows, columns, column families, versions
- HBase Shell and Admin commands
- Lab : HBase Shell
- Section 3 : Accessing HBase using Java API
- Introduction to Java API
- Read / Write path
- Time Series data
- Scans
- Map Reduce
- Filters
- Counters
- Co-processors
- Labs (multiple) : Using HBase Java API to implement time series , Map Reduce, Filters and counters.
- Section 4 : HBase schema Design : Group session
- students are presented with real world use cases
- students work in groups to come up with design solutions
- discuss / critique and learn from multiple designs
- Labs : implement a scenario in HBase
- Section 5 : HBase Internals
- Understanding HBase under the hood
- Memfile / HFile / WAL
- HDFS storage
- Compactions
- Splits
- Bloom Filters
- Caches
- Diagnostics
- Section 6 : HBase installation and configuration
- hardware selection
- install methods
- common configurations
- Lab : installing HBase
- Section 7 : HBase eco-system
- developing applications using HBase
- interacting with other Hadoop stack (MapReduce, Pig, Hive)
- frameworks around HBase
- advanced concepts (co-processors)
- Labs : writing HBase applications
- Section 8 : Monitoring And Best Practices
- monitoring tools and practices
- optimizing HBase
- HBase in the cloud
- real world use cases of HBase
- Labs : checking HBase vitals
Requirements
- comfortable with Java programming language
- comfortable in Java programming language (navigate Linux command line, edit files with vi / nano)
- A Java IDE like Eclipse or IntelliJ
Lab environment:
A working HBase cluster will be provided for students. Students would need an SSH client and a browser to access the cluster.
Zero Install : There is no need to install HBase software on students’ machines!
Testimonials
Related Courses
Aerospike for Developers
14 hoursThis course covers everything a database developer needs to know to successfully develop applications using Aerospike.
Apache Ambari: Efficiently Manage Hadoop Clusters
21 hoursApache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to
Big Data Storage Solution - NoSQL
14 hoursWhen traditional storage technologies don't handle the amount of data you need to store there are hundereds of alternatives. This course try to guide the participants what are alternatives for storing and analyzing Big Data and what are theirs
A Practical Introduction to NoSQL Databases
28 hoursRelational databases have been the technology of choice for storing, retrieving and querying data. Relational databases allow users to organize their data using a structured, well-defined set of patterns (model). While this approach works well for
OrientDB for Developers
14 hoursOrientDB is a NoSQL Multi-Model Database that works with Graph, Document, Key-Value, GeoSpatial, and Reactive models. Its flexibility allows users to manage different kinds of data under one centralized database. In this instructor-led, live
Riak: Build Applications with High Data Accuracy
14 hoursRiak is an Erlang based open-source document database, similar to CouchDB. It is created and maintained by Basho. In this instructor-led, live training, participants will learn how to build, run and operate a Riak based web application. By the
Scylla Database
21 hoursScylla is an open-source distributed NoSQL data store. It is compatible with Apache Cassandra but performs at significantly higher throughputs and lower latencies. In this course, participants will learn about Scylla's features and
Big Data & Database Systems Fundamentals
14 hoursThe course is part of the Data Scientist skill set (Domain: Data and Technology).
MemSQL
28 hoursMemSQL is an in-memory, distributed, SQL database management system for cloud and on-premises. It's a real-time data warehouse that immediately delivers insights from live and historical data. In this instructor-led, live training,
NoSQL Database with Microsoft Azure Cosmos DB
14 hoursMicrosoft Azure Cosmos DB is a fully managed NoSQL database service designed for high-speed data processing and storage scaling. It supports multiple data models and open-source APIs, such as MongoDB and Cassandra. This instructor-led, live
Hadoop Administration
21 hoursThe course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster
Hortonworks Data Platform (HDP) for Administrators
21 hoursHortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces
Data Analysis with Hive/HiveQL
7 hoursThis course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Impala for Business Intelligence
21 hoursCloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache
Redis for High Availability and Performance Training Course
21 hoursRedis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker.