Course Outline

  • Section 1: Introduction to Big Data & NoSQL
    • Big Data ecosystem
    • NoSQL overview
    • CAP theorem
    • When is NoSQL appropriate
    • Columnar storage
    • HBase and NoSQL
  • Section 2 : HBase Intro
    • Concepts and Design
    • Architecture (HMaster and Region Server)
    • Data integrity
    • HBase ecosystem
    • Lab : Exploring HBase
  • Section 3 : HBase Data model
    • Namespaces, Tables and Regions
    • Rows, columns, column families, versions
    • HBase Shell and Admin commands
    • Lab : HBase Shell
  • Section 3 : Accessing HBase using Java API
    • Introduction to Java API
    • Read / Write path
    • Time Series data
    • Scans
    • Map Reduce
    • Filters
    • Counters
    • Co-processors
    • Labs (multiple) : Using HBase Java API to implement  time series , Map Reduce, Filters and counters.
  • Section 4 : HBase schema Design : Group session
    • students are presented with real world use cases
    • students work in groups to come up with design solutions
    • discuss / critique and learn from multiple designs
    • Labs : implement a scenario in HBase
  • Section 5 : HBase Internals
    • Understanding HBase under the hood
    • Memfile / HFile / WAL
    • HDFS storage
    • Compactions
    • Splits
    • Bloom Filters
    • Caches
    • Diagnostics
  • Section 6 : HBase installation and configuration
    • hardware selection
    • install methods
    • common configurations
    • Lab : installing HBase
  • Section 7 : HBase eco-system
    • developing applications using HBase
    • interacting with other Hadoop stack (MapReduce, Pig, Hive)
    • frameworks around HBase
    • advanced concepts (co-processors)
    • Labs : writing HBase applications
  • Section 8 : Monitoring And Best Practices
    • monitoring tools and practices
    • optimizing HBase
    • HBase in the cloud
    • real world use cases of HBase
    • Labs : checking HBase vitals

Requirements

  • comfortable with Java programming language
  • comfortable in Java programming language (navigate Linux command line, edit files with vi / nano)
  • A Java IDE like Eclipse or IntelliJ

Lab environment:

A working HBase cluster will be provided for students. Students would need an SSH client and a browser to access the cluster.

Zero Install : There is no need to install HBase software on students’ machines!

  21 Hours
 

Testimonials

Related Courses

Aerospike for Developers

 14 hours

This course covers everything a database developer needs to know to successfully develop applications using Aerospike.

Apache Ambari: Efficiently Manage Hadoop Clusters

 21 hours

Apache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to

Big Data Storage Solution - NoSQL

 14 hours

When traditional storage technologies don't handle the amount of data you need to store there are hundereds of alternatives. This course try to guide the participants what are alternatives for storing and analyzing Big Data and what are theirs

A Practical Introduction to NoSQL Databases

 28 hours

Relational databases have been the technology of choice for storing, retrieving and querying data. Relational databases allow users to organize their data using a structured, well-defined set of patterns (model). While this approach works well for

OrientDB for Developers

 14 hours

OrientDB is a NoSQL Multi-Model Database that works with Graph, Document, Key-Value, GeoSpatial, and Reactive models. Its flexibility allows users to manage different kinds of data under one centralized database. In this instructor-led, live

Riak: Build Applications with High Data Accuracy

 14 hours

Riak is an Erlang based open-source document database, similar to CouchDB. It is created and maintained by Basho. In this instructor-led, live training, participants will learn how to build, run and operate a Riak based web application. By the

Scylla Database

 21 hours

Scylla is an open-source distributed NoSQL data store. It is compatible with Apache Cassandra but performs at significantly higher throughputs and lower latencies. In this course, participants will learn about Scylla's features and

Big Data & Database Systems Fundamentals

 14 hours

The course is part of the Data Scientist skill set (Domain: Data and Technology).

MemSQL

 28 hours

MemSQL is an in-memory, distributed, SQL database management system for cloud and on-premises. It's a real-time data warehouse that immediately delivers insights from live and historical data. In this instructor-led, live training,

NoSQL Database with Microsoft Azure Cosmos DB

 14 hours

Microsoft Azure Cosmos DB is a fully managed NoSQL database service designed for high-speed data processing and storage scaling. It supports multiple data models and open-source APIs, such as MongoDB and Cassandra. This instructor-led, live

Hadoop Administration

 21 hours

The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster

Hortonworks Data Platform (HDP) for Administrators

 21 hours

Hortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces

Data Analysis with Hive/HiveQL

 7 hours

This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive

Impala for Business Intelligence

 21 hours

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache

Redis for High Availability and Performance Training Course

 21 hours

Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker.