Course Outline
Introduction
Understanding Hadoop's Architecture and Key Concepts
Understanding the Hadoop Distributed File System (HDFS)
- Overview of HDFS and its Architectural Design
- Interacting with HDFS
- Performing Basic File Operations on HDFS
- Overview of HDFS Command Reference
- Overview of Snakebite
- Installing Snakebite
- Using the Snakebite Client Library
- Using the CLI Client
Learning the MapReduce Programming Model with Python
- Overview of the MapReduce Programming Model
- Understanding Data Flow in the MapReduce Framework
- Map
- Shuffle and Sort
- Reduce
- Using the Hadoop Streaming Utility
- Understanding How the Hadoop Streaming Utility Works
- Demo: Implementing the WordCount Application on Python
- Using the mrjob Library
- Overview of mrjob
- Installing mrjob
- Demo: Implementing the WordCount Algorithm Using mrjob
- Understanding How a MapReduce Job Written with the mrjob Library Works
- Executing a MapReduce Application with mrjob
- Hands-on: Computing Top Salaries Using mrjob
Learning Pig with Python
- Overview of Pig
- Demo: Implementing the WordCount Algorithm in Pig
- Configuring and Running Pig Scripts and Pig Statements
- Using the Pig Execution Modes
- Using the Pig Interactive Mode
- Using the Pic Batch Mode
- Understanding the Basic Concepts of the Pig Latin Language
- Using Statements
- Loading Data
- Transforming Data
- Storing Data
- Extending Pig's Functionality with Python UDFs
- Registering a Python UDF File
- Demo: A Simple Python UDF
- Demo: String Manipulation Using Python UDF
- Hands-on: Calculating the 10 Most Recent Movies Using Python UDF
Using Spark and PySpark
- Overview of Spark
- Demo: Implementing the WordCount Algorithm in PySpark
- Overview of PySpark
- Using an Interactive Shell
- Implementing Self-Contained Applications
- Working with Resilient Distributed Datasets (RDDs)
- Creating RDDs from a Python Collection
- Creating RDDs from Files
- Implementing RDD Transformations
- Implementing RDD Actions
- Hands-on: Implementing a Text Search Program for Movie Titles with PySpark
Managing Workflow with Python
- Overview of Apache Oozie and Luigi
- Installing Luigi
- Understanding Luigi Workflow Concepts
- Tasks
- Targets
- Parameters
- Demo: Examining a Workflow that Implements the WordCount Algorithm
- Working with Hadoop Workflows that Control MapReduce and Pig Jobs
- Using Luigi's Configuration Files
- Working with MapReduce in Luigi
- Working with Pig in Luigi
Summary and Conclusion
Requirements
- Experience with Python programming
- Basic familiarity with Hadoop
Testimonials
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
vyzVoice
I mostly liked the trainer giving real live Examples.
Simon Hahn
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczątka
It was very hands-on, we spent half the time actually doing things in Clouded/Hardtop, running different commands, checking the system, and so on. The extra materials (books, websites, etc. .) were really appreciated, we will have to continue to learn. The installations were quite fun, and very handy, the cluster setup from scratch was really good.
Ericsson
Lot of hands-on exercises.
- Ericsson
Ambari management tool. Ability to discuss practical Hadoop experiences from other business case than telecom.
- Ericsson
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Training topics and engagement of the trainer
- Izba Administracji Skarbowej w Lublinie
Communication with people attending training.
Andrzej Szewczuk - Izba Administracji Skarbowej w Lublinie
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Exercises
- Capgemini Polska Sp. z o.o.
usefulness of exercises
- Algomine sp.z.o.o sp.k.
I found the training good, very informative....but could have been spread over 4 or 5 days, allowing us to go into more details on different aspects.
- Veterans Affairs Canada
I really enjoyed the training. Anton has a lot of knowledge and laid out the necessary theory in a very accessible way. It is great that the training was a lot of interesting exercises, so we have been in contact with the technology we know from the very beginning.
Szymon Dybczak - Algomine sp.z.o.o sp.k.
I found this course gave a great overview and quickly touched some areas I wasn't even considering.
- Veterans Affairs Canada
I genuinely liked work exercises with cluster to see performance of nodes across cluster and extended functionality.
CACI Ltd
The trainers in depth knowledge of the subject
CACI Ltd
Ajay was a very experienced consultant and was able to answer all our questions and even made suggestions on best practices for the project we are currently engaged on.
CACI Ltd
That I had it in the first place.
Peter Scales - CACI Ltd
The NIFI workflow excercises
Politiets Sikkerhetstjeneste
answers to our specific questions