Hadoop and Spark for Administrators Training Course

Apache Hadoop is a widely-used framework for processing extensive datasets across multiple computers.

This instructor-led training (online or in-person) targets system administrators looking to master the setup, deployment, and management of Hadoop clusters within their organization.

By the end of this course, participants will be able to:

Install and configure Apache Hadoop.
Grasp the four main components in the Hadoop ecosystem: HDFS, MapReduce, YARN, and Hadoop Common.
Leverage Hadoop Distributed File System (HDFS) to scale a cluster to hundreds or thousands of nodes.
Configure HDFS as the storage engine for on-premise Spark deployments.
Set up Spark to access alternative storage solutions such as Amazon S3 and NoSQL databases like Redis, Elasticsearch, Couchbase, Aerospike, etc.
Perform administrative tasks including provisioning, management, monitoring, and securing an Apache Hadoop cluster.

Course Format

Interactive lecture and discussion.
Extensive exercises and practice sessions.
Hands-on implementation in a live-lab environment.

Customization Options for the Course

To request tailored training for this course, please contact us to arrange.

This course is available as onsite live training in United Arab Emirates or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

Introduction to Cloud Computing and Big Data solutions
Overview of Apache Hadoop Features and Architecture

Setting up Hadoop

Planning a Hadoop cluster (on-premise, cloud, etc.)
Selecting the OS and Hadoop distribution
Provisioning resources (hardware, network, etc.)
Downloading and installing the software
Sizing the cluster for flexibility

Working with HDFS

Understanding the Hadoop Distributed File System (HDFS)
Overview of HDFS Command Reference
Accessing HDFS
Performing Basic File Operations on HDFS
Using S3 as a complement to HDFS

Overview of the MapReduce

Understanding Data Flow in the MapReduce Framework
Map, Shuffle, Sort and Reduce
Demo: Computing Top Salaries

Working with YARN

Understanding resource management in Hadoop
Working with ResourceManager, NodeManager, Application Master
Scheduling jobs under YARN
Scheduling for large numbers of nodes and clusters
Demo: Job scheduling

Integrating Hadoop with Spark

Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
Understanding Resilient Distributed Datasets (RDDs)
Creating an RDD
Implementing RDD Transformations
Demo: Implementing a Text Search Program for Movie Titles

Managing a Hadoop Cluster

Monitoring Hadoop
Securing a Hadoop cluster
Adding and removing nodes
Running a performance benchmark
Tuning a Hadoop cluster to optimizing performance
Backup, recovery and business continuity planning
Ensuring high availability (HA)

Upgrading and Migrating a Hadoop Cluster

Assessing workload requirements
Upgrading Hadoop
Moving from on-premise to cloud and vice-versa
Recovering from failures

Troubleshooting

Summary and Conclusion

Requirements

System administration experience
Experience with Linux command line
An understanding of big data concepts

Audience

System administrators
DBAs

35 Hours

Need help picking the right course?

Testimonials (5)

A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution

Rafal - Nordea

Course - Apache Spark MLlib

very interactive...

Richard Langford

Course - SMACK Stack for Data Science

Sufficient hands on, trainer is knowledgable

Chris Tan

Course - A Practical Introduction to Stream Processing

Trainer's preparation & organization, and quality of materials provided on github.

Hadoop and Spark for Administrators Training Course

Course Outline

Requirements

Testimonials (5)

Rafal - Nordea

Course - Apache Spark MLlib

Richard Langford

Course - SMACK Stack for Data Science

Chris Tan

Course - A Practical Introduction to Stream Processing

Mateusz Rek - MicroStrategy Poland Sp. z o.o.

Course - Impala for Business Intelligence

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Upcoming Courses

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Hadoop and Spark for Administrators Training Course

Course Outline

Requirements

Testimonials (5)

Rafal - Nordea

Course - Apache Spark MLlib

Richard Langford

Course - SMACK Stack for Data Science

Chris Tan

Course - A Practical Introduction to Stream Processing

Mateusz Rek - MicroStrategy Poland Sp. z o.o.

Course - Impala for Business Intelligence

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Upcoming Courses

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Hadoop and Spark for Administrators

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

Big Data Analytics in Health

Introduction to Graph Computing

Hortonworks Data Platform (HDP) for Administrators

Data Analysis with Hive/HiveQL

Impala for Business Intelligence

A Practical Introduction to Stream Processing

SMACK Stack for Data Science

Apache Spark in the Cloud

Spark for Developers

OBJECTIVE:

AUDIENCE :

Python and Spark for Big Data (PySpark)

Apache Spark SQL

Apache Spark MLlib

Stratio: Rocket and Intelligence Modules with PySpark

Related Categories

Hadoop

Apache Spark

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites