Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
1: HDFS (17%)
- Explain the role of HDFS Daemons
- Describe the standard operation of an Apache Hadoop cluster, encompassing both data storage and processing functions.
- Identify current computing system features that drive the need for systems like Apache Hadoop.
- Classify the primary objectives of HDFS Design.
- Determine the appropriate use case for HDFS Federation within a given scenario.
- Identify the components and daemons of an HDFS HA-Quorum cluster.
- Analyze the role of HDFS security mechanisms, including Kerberos.
- Select the optimal data serialization choice for a specific scenario.
- Describe the pathways for file read and write operations.
- Identify the commands required to manipulate files using the Hadoop File System Shell.
2: YARN and MapReduce version 2 (MRv2) (17%)
- Understand the impact on cluster settings when upgrading from Hadoop 1 to Hadoop 2.
- Comprehend the deployment of MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
- Understand the core design strategy for MapReduce v2 (MRv2).
- Determine how YARN manages resource allocations.
- Identify the workflow of a MapReduce job executing on YARN.
- Identify the specific files that must be modified and how to do so when migrating a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.
3: Hadoop Cluster Planning (16%)
- Highlight key considerations for selecting hardware and operating systems to host an Apache Hadoop cluster.
- Analyze the options available when selecting an operating system.
- Understand kernel tuning and disk swapping processes.
- Identify an appropriate hardware configuration for a given scenario and workload pattern.
- Determine the necessary ecosystem components for a cluster to meet SLA requirements in a specific scenario.
- Cluster sizing: Identify workload specifics, including CPU, memory, storage, and disk I/O, based on a scenario and execution frequency.
- Disk Sizing and Configuration: Understand JBOD versus RAID, SANs, virtualization, and disk sizing requirements within a cluster.
- Network Topologies: Understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario.
4: Hadoop Cluster Installation and Administration (25%)
- Identify how the cluster handles disk and machine failures in a given scenario.
- Analyze logging configuration and the format of logging configuration files.
- Understand the fundamentals of Hadoop metrics and cluster health monitoring.
- Identify the function and purpose of available tools for cluster monitoring.
- Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
- Identify the function and purpose of available tools for managing the Apache Hadoop file system.
5: Resource Management (10%)
- Understand the overall design goals of each of Hadoop schedulers.
- Determine how the FIFO Scheduler allocates cluster resources in a given scenario.
- Determine how the Fair Scheduler allocates cluster resources under YARN in a given scenario.
- Determine how the Capacity Scheduler allocates cluster resources in a given scenario.
6: Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection capabilities.
- Analyze the NameNode and JobTracker Web UIs.
- Understand methods for monitoring cluster Daemons.
- Identify and monitor CPU usage on master nodes.
- Describe methods for monitoring swap and memory allocation across all nodes.
- Identify methods for viewing and managing Hadoop’s log files.
- Interpret a log file.
Requirements
- Foundational Linux administration skills
- Basic programming proficiency
35 Hours
Testimonials (3)
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczatka
Course - Administrator Training for Apache Hadoop
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
Course - Administrator Training for Apache Hadoop
I mostly liked the trainer giving real live Examples.