Hadoop For Administrators Training Course

Apache Hadoop is the leading framework for processing Big Data across clusters of servers. This course, which spans three days (or optionally four), will equip participants with knowledge on the business advantages and practical applications of Hadoop and its ecosystem. Attendees will learn how to plan cluster deployment and expansion, as well as how to install, maintain, monitor, troubleshoot, and optimize Hadoop systems. They will also gain hands-on experience in loading bulk data into clusters, become acquainted with different Hadoop distributions, and practice setting up and managing tools within the Hadoop ecosystem. The course concludes with a discussion on securing clusters using Kerberos.

“…The materials were meticulously prepared and covered comprehensively. The Lab was very beneficial and well-organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Audience

Hadoop administrators

Format

The course includes lectures and hands-on labs, with an approximate 60% lecture to 40% lab balance.

This course is available as onsite live training in United Arab Emirates or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction
- Hadoop history, concepts
- Ecosystem
- Distributions
- High level architecture
- Hadoop myths
- Hadoop challenges (hardware / software)
- Labs: discuss your Big Data projects and problems
Planning and installation
- Selecting software, Hadoop distributions
- Sizing the cluster, planning for growth
- Selecting hardware and network
- Rack topology
- Installation
- Multi-tenancy
- Directory structure, logs
- Benchmarking
- Labs: cluster install, run performance benchmarks
HDFS operations
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Command-line and browser-based administration
- Adding storage, replacing defective drives
- Labs: getting familiar with HDFS command lines
Data ingestion
- Flume for logs and other data ingestion into HDFS
- Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
- Hadoop data warehousing with Hive
- Copying data between clusters (distcp)
- Using S3 as complementary to HDFS
- Data ingestion best practices and architectures
- Labs: setting up and using Flume, the same for Sqoop
MapReduce operations and administration
- Parallel computing before mapreduce: compare HPC vs Hadoop administration
- MapReduce cluster loads
- Nodes and Daemons (JobTracker, TaskTracker)
- MapReduce UI walk through
- Mapreduce configuration
- Job config
- Optimizing MapReduce
- Fool-proofing MR: what to tell your programmers
- Labs: running MapReduce examples
YARN: new architecture and new capabilities
- YARN design goals and implementation architecture
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling under YARN
- Labs: investigate job scheduling
Advanced topics
- Hardware monitoring
- Cluster monitoring
- Adding and removing servers, upgrading Hadoop
- Backup, recovery and business continuity planning
- Oozie job workflows
- Hadoop high availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: set up monitoring
Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
- Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)

Requirements

comfortable with basic Linux system administration
basic scripting skills

Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed

21 Hours

Need help picking the right course?

Testimonials (5)

The live examples

Ahmet Bolat - Accenture Industrial SS

Course - Python, Spark, and Hadoop for Big Data

During the exercises, James explained me every step whereever I was getting stuck in more detail. I was completely new to NIFI. He explained the actual purpose of NIFI, even the basics such as open source. He covered every concept of Nifi starting from Beginner Level to Developer Level.

Firdous Hashim Ali - MOD A BLOCK

Course - Apache NiFi for Administrators

Trainer's preparation & organization, and quality of materials provided on github.

Hadoop For Administrators Training Course

Audience

Format

Course Outline

Requirements

Lab environment

Testimonials (5)

Ahmet Bolat - Accenture Industrial SS

Course - Python, Spark, and Hadoop for Big Data

Firdous Hashim Ali - MOD A BLOCK

Course - Apache NiFi for Administrators

Mateusz Rek - MicroStrategy Poland Sp. z o.o.

Course - Impala for Business Intelligence

Peter Scales - CACI Ltd

Course - Apache NiFi for Developers

Dominik Mazur - Capgemini Polska Sp. z o.o.

Course - Hadoop Administration on MapR

Upcoming Courses

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Hadoop For Administrators Training Course

Audience

Format

Course Outline

Requirements

Lab environment

Testimonials (5)

Ahmet Bolat - Accenture Industrial SS

Course - Python, Spark, and Hadoop for Big Data

Firdous Hashim Ali - MOD A BLOCK

Course - Apache NiFi for Administrators

Mateusz Rek - MicroStrategy Poland Sp. z o.o.

Course - Impala for Business Intelligence

Peter Scales - CACI Ltd

Course - Apache NiFi for Developers

Dominik Mazur - Capgemini Polska Sp. z o.o.

Course - Hadoop Administration on MapR

Upcoming Courses

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Related Courses

Administrator Training for Apache Hadoop

Audience:

Goal:

Big Data Analytics in Health

Hadoop Administration

Course Objective:

Hadoop for Developers (4 days)

Advanced Hadoop for Developers

Hadoop Administration on MapR

Audience:

Hadoop and Spark for Administrators

HBase for Developers

Hortonworks Data Platform (HDP) for Administrators

Data Analysis with Hive/HiveQL

Impala for Business Intelligence

Infomatica with Big Data (BDM)

Apache NiFi for Administrators

Apache NiFi for Developers

Python, Spark, and Hadoop for Big Data

Related Categories

Hadoop

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites