Big Data Overview:
- What is Big Data
- Why Big Data is gaining popularity
- Big Data Case Studies
- Big Data Characteristics
- Solutions to work on Big Data.
Hadoop & Its components:
- What is Hadoop and what are its components.
- Hadoop Architecture and its characteristics of Data it can handle /Process.
- Brief on Hadoop History, companies using it and why they have started using it.
- Hadoop Frame work & its components- explained in detail.
- What is HDFS and Reads -Writes to Hadoop Distributed File System.
- How to Setup Hadoop Cluster in different modes- Stand- alone/Pseudo/Multi Node cluster.
(This includes setting up a Hadoop cluster in VirtualBox/KVM/VMware, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster).
- What is Map Reduce frame work and how it works.
- Running Map Reduce jobs on Hadoop cluster.
- Understanding Replication , Mirroring and Rack awareness in context of Hadoop clusters.
Hadoop Cluster Planning:
- How to plan your hadoop cluster.
- Understanding hardware-software to plan your hadoop cluster.
- Understanding workloads and planning cluster to avoid failures and perform optimum.
What is MapR and why MapR :
- Overview of MapR and its architecture.
- Understanding & working of MapR Control System, MapR Volumes , snapshots & Mirrors.
- Planning a cluster in context of MapR.
- Comparison of MapR with other distributions and Apache Hadoop.
- MapR installation and cluster deployment.
Cluster Setup & Administration:
- Managing services, nodes ,snapshots, mirror volumes and remote clusters.
- Understanding and managing Nodes.
- Understanding of Hadoop components, Installing Hadoop components alongside MapR Services.
- Accessing Data on cluster including via NFS Managing services & nodes.
- Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing and monitoring metrics to monitor performance, configuring and administering MapR security.
- Understanding and working with M7- Native storage for MapR tables.
- Cluster configuration and tuning for optimum performance.
Cluster upgrade and integration with other setups:
- Upgrading software version of MapR and types of upgrade.
- Configuring Mapr cluster to access HDFS cluster.
- Setting up MapR cluster on Amazon Elastic Mapreduce.
All the above topics include Demonstrations and practice sessions for learners to have hands on experience of the technology.
- Basic knowledge of Linux FS
- Basic Java
- Knowledge of Apache Hadoop (recommended)
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
- Capgemini Polska Sp. z o.o.
Apache Ambari: Efficiently Manage Hadoop Clusters21 hours
Apache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to
Administrator Training for Apache Hadoop35 hours
Audience: The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment Goal: Deep knowledge on Hadoop cluster
Apache Hadoop: Manipulation and Transformation of Data Performance21 hours
This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis. The major focus of the course is data manipulation and transformation. Among the tools
Hadoop Administration21 hours
The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster
Hadoop For Administrators21 hours
Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan
Hadoop for Business Analysts21 hours
Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of
Hadoop for Developers (4 days)28 hours
Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop
Advanced Hadoop for Developers21 hours
Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase. These advanced programming techniques will be beneficial to
Hadoop for Developers and Administrators21 hours
Hadoop is the most popular Big Data processing framework.
Hadoop for Project Managers14 hours
As more and more software and IT projects migrate from local processing and data management to distributed processing and big data storage, Project Managers are finding the need to upgrade their knowledge and skills to grasp the concepts and
HBase for Developers21 hours
This course introduces HBase – a NoSQL store on top of Hadoop. The course is intended for developers who will be using HBase to develop applications, and administrators who will manage HBase clusters. We will walk a developer
Hortonworks Data Platform (HDP) for Administrators21 hours
Hortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces
Data Analysis with Hive/HiveQL7 hours
This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Impala for Business Intelligence21 hours
Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache
Apache Avro: Data Serialization for Distributed Applications14 hours
Audience Developers Format of the Course Lectures, hands-on practice, small tests along the way to gauge understanding