Get in Touch

Course Outline

Big Data Overview:

  • Defining Big Data
  • Reasons behind the growing popularity of Big Data
  • Case studies illustrating Big Data applications
  • Key characteristics of Big Data
  • Solutions tailored for processing Big Data

Hadoop & Its Components:

  • Introduction to Hadoop and its core components
  • Hadoop architecture and its capabilities for handling and processing data
  • A brief history of Hadoop, key companies adopting it, and the motivations behind its adoption
  • Detailed explanation of the Hadoop framework and its components
  • Understanding HDFS and the mechanics of reading from and writing to the Hadoop Distributed File System
  • Procedures for setting up a Hadoop cluster in various modes: Stand-alone, Pseudo-distributed, and Multi-node

This section covers configuring a Hadoop cluster within VirtualBox, KVM, or VMware, including essential network configurations, starting Hadoop daemons, and validating cluster functionality.

  • Introduction to the MapReduce framework and its operational mechanics
  • Executing MapReduce jobs on a Hadoop cluster
  • Concepts of replication, mirroring, and rack awareness within Hadoop clusters

Hadoop Cluster Planning:

  • Strategies for planning your Hadoop cluster
  • Aligning hardware and software requirements for optimal cluster planning
  • Analyzing workloads to prevent failures and ensure peak performance

What is MapR and Why Choose MapR:

  • Overview of MapR and its architecture
  • Understanding and utilizing the MapR Control System, MapR Volumes, snapshots, and mirrors
  • Planning a cluster specifically within the MapR context
  • Comparing MapR against other distributions and Apache Hadoop
  • Installing MapR and deploying the cluster

Cluster Setup & Administration:

  • Managing services, nodes, snapshots, mirrored volumes, and remote clusters
  • Understanding and managing cluster nodes
  • Grasping Hadoop components and installing them alongside MapR services
  • Accessing data on the cluster, including via NFS, and managing services and nodes
  • Managing data through volumes, handling users and groups, assigning roles to nodes, commissioning and decommissioning nodes, performing cluster administration and performance monitoring, configuring and analyzing metrics for performance insights, and administering MapR security
  • Working with M7 Native storage for MapR tables
  • Configuring and tuning the cluster for optimal performance

Cluster Upgrade and Integration with Other Setups:

  • Upgrading MapR software versions and understanding different upgrade types
  • Configuring the MapR cluster to access an HDFS cluster
  • Setting up a MapR cluster on Amazon Elastic MapReduce

All topics above include demonstrations and practice sessions to provide learners with hands-on experience with the technology.

Requirements

  • Foundational knowledge of the Linux File System
  • Basic Java programming skills
  • Familiarity with Apache Hadoop (recommended)
 28 Hours

Testimonials (1)

Upcoming Courses

Related Categories