Course Outline

Introduction

  • Why and how project teams adopt Hadoop
  • How it all started
  • The Project Manager's role in Hadoop projects

Understanding Hadoop's Architecture and Key Concepts

  • HDFS
  • MapReduce
  • Other pieces of the Hadoop ecosystem

What Constitutes Big Data?

Different Approaches to Storing Big Data

HDFS (Hadoop Distributed File System) as the Foundation

How Big Data is Processed

  • The power of distributed processing

Processing Data with MapReduce

  • How data is picked apart step by step

The Role of Clustering in Large-Scale Distributed Processing

  • Architectural overview
  • Clustering approaches

Clustering Your Data and Processes with YARN

The Role of Non-Relational Database in Big Data Storage

Working with Hadoop's Non-Relational Database: HBase

Data Warehousing Architectural Overview

Managing Your Data Warehouse with Hive

Running Hadoop from Shell-Scripts

Working with Hadoop Streaming

Other Hadoop Tools and Utilities

Getting Started on a Hadoop Project

  • Demystifying complexity

Migrating an Existing Project to Hadoop

  • Infrastructure considerations
  • Scaling beyond your allocated resources

Hadoop Project Stakeholders and Their Toolkits

  • Developers, data scientists, business analysts and project managers

Hadoop as a Foundation for New Technologies and Approaches

Closing Remarks

Requirements

  • A general understanding of programming
  • An understanding of databases
  • Basic knowledge of Linux
  14 Hours
 

Testimonials

Related Courses

Hortonworks Data Platform (HDP) for Administrators

  21 hours

Apache Ambari: Efficiently Manage Hadoop Clusters

  21 hours

Impala for Business Intelligence

  21 hours

Data Analysis with Hive/HiveQL

  7 hours

Hadoop Administration

  21 hours

Administrator Training for Apache Hadoop

  35 hours

Hadoop Administration on MapR

  28 hours

Hadoop for Developers (4 days)

  28 hours

Advanced Hadoop for Developers

  21 hours

HBase for Developers

  21 hours

Hadoop For Administrators

  21 hours

Hadoop for Business Analysts

  21 hours

Hadoop for Developers and Administrators

  21 hours

Apache Avro: Data Serialization for Distributed Applications

  14 hours

Apache Hadoop: Manipulation and Transformation of Data Performance

  21 hours