Course Outline

  • Section 1: Introduction to Hadoop
    • hadoop history, concepts
    • eco system
    • distributions
    • high level architecture
    • hadoop myths
    • hadoop challenges
    • hardware / software
    • Labs : first look at Hadoop
  • Section 2: HDFS Overview
    • concepts (horizontal scaling, replication, data locality, rack awareness)
    • architecture (Namenode, Secondary namenode, Data node)
    • data integrity
    • future of HDFS : Namenode HA, Federation
    • labs : Interacting with HDFS
  • Section 3 : Map Reduce Overview
    • mapreduce concepts
    • daemons : jobtracker / tasktracker
    • phases : driver, mapper, shuffle/sort, reducer
    • Thinking in map reduce
    • Future of mapreduce (yarn)
    • labs : Running a Map Reduce program
  • Section 4 : Pig
    • pig vs java map reduce
    • pig latin language
    • user defined functions
    • understanding pig job flow
    • basic data analysis with Pig
    • complex data analysis with Pig
    • multi datasets with Pig
    • advanced concepts
    • lab : writing pig scripts to analyze / transform data
  • Section 5: Hive
    • hive concepts
    • architecture
    • SQL support in Hive
    • data types
    • table creation and queries
    • Hive data management
    • partitions & joins
    • text analytics
    • labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions
  • Section 6: BI Tools for Hadoop
    • BI tools and Hadoop
    • Overview of current BI tools landscape
    • Choosing the best tool for the job

Requirements

  • programming background with databases / SQL
  • basic knowledge of Linux (be able to navigate Linux command line, editing files with vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working Hadoop cluster will be provided for students.

Students will need the following

  21 Hours
 

Testimonials

Related Courses

Hortonworks Data Platform (HDP) for Administrators

  21 hours

Apache Ambari: Efficiently Manage Hadoop Clusters

  21 hours

Impala for Business Intelligence

  21 hours

Data Analysis with Hive/HiveQL

  7 hours

Fintech: A Practical Introduction for Managers

  14 hours

Matlab for Prescriptive Analytics

  14 hours

Software Engineering, Requirements Engineering and Testing

  63 hours

Model Based Development for Embedded Systems

  21 hours

Requirements Analysis

  21 hours

Hadoop Administration

  21 hours

Administrator Training for Apache Hadoop

  35 hours

Hadoop Administration on MapR

  28 hours

Hadoop for Developers (4 days)

  28 hours

Advanced Hadoop for Developers

  21 hours

HBase for Developers

  21 hours