Course Outline

Introduction

  • Why and how project teams adopt Hadoop
  • How it all started
  • The Project Manager's role in Hadoop projects

Understanding Hadoop's Architecture and Key Concepts

  • HDFS
  • MapReduce
  • Other pieces of the Hadoop ecosystem

What Constitutes Big Data?

Different Approaches to Storing Big Data

HDFS (Hadoop Distributed File System) as the Foundation

How Big Data is Processed

  • The power of distributed processing

Processing Data with MapReduce

  • How data is picked apart step by step

The Role of Clustering in Large-Scale Distributed Processing

  • Architectural overview
  • Clustering approaches

Clustering Your Data and Processes with YARN

The Role of Non-Relational Database in Big Data Storage

Working with Hadoop's Non-Relational Database: HBase

Data Warehousing Architectural Overview

Managing Your Data Warehouse with Hive

Running Hadoop from Shell-Scripts

Working with Hadoop Streaming

Other Hadoop Tools and Utilities

Getting Started on a Hadoop Project

  • Demystifying complexity

Migrating an Existing Project to Hadoop

  • Infrastructure considerations
  • Scaling beyond your allocated resources

Hadoop Project Stakeholders and Their Toolkits

  • Developers, data scientists, business analysts and project managers

Hadoop as a Foundation for New Technologies and Approaches

Closing Remarks

Requirements

  • A general understanding of programming
  • An understanding of databases
  • Basic knowledge of Linux
  14 Hours
 

Testimonials

Related Courses

Apache Ambari: Efficiently Manage Hadoop Clusters

 21 hours

Apache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to

Administrator Training for Apache Hadoop

 35 hours

Audience: The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment Goal: Deep knowledge on Hadoop cluster

Apache Hadoop: Manipulation and Transformation of Data Performance

 21 hours

This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis. The major focus of the course is data manipulation and transformation. Among the tools

Hadoop Administration

 21 hours

The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster

Hadoop For Administrators

 21 hours

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan

Hadoop for Business Analysts

 21 hours

Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of

Hadoop for Developers (4 days)

 28 hours

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop

Advanced Hadoop for Developers

 21 hours

Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase.  These advanced programming techniques will be beneficial to

Hadoop for Developers and Administrators

 21 hours

Hadoop is the most popular Big Data processing framework.

Hadoop Administration on MapR

 28 hours

Audience: This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand.

HBase for Developers

 21 hours

This course introduces HBase – a NoSQL store on top of Hadoop.  The course is intended for developers who will be using HBase to develop applications,  and administrators who will manage HBase clusters. We will walk a developer

Hortonworks Data Platform (HDP) for Administrators

 21 hours

Hortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces

Data Analysis with Hive/HiveQL

 7 hours

This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive

Impala for Business Intelligence

 21 hours

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache

Apache Avro: Data Serialization for Distributed Applications

 14 hours

Audience Developers Format of the Course Lectures, hands-on practice, small tests along the way to gauge understanding