- Introduction to Cloud Computing and Big Data solutions
- Overview of Apache Hadoop Features and Architecture
Setting up Hadoop
- Planning a Hadoop cluster (on-premise, cloud, etc.)
- Selecting the OS and Hadoop distribution
- Provisioning resources (hardware, network, etc.)
- Downloading and installing the software
- Sizing the cluster for flexibility
Working with HDFS
- Understanding the Hadoop Distributed File System (HDFS)
- Overview of HDFS Command Reference
- Accessing HDFS
- Performing Basic File Operations on HDFS
- Using S3 as a complement to HDFS
Overview of the MapReduce
- Understanding Data Flow in the MapReduce Framework
- Map, Shuffle, Sort and Reduce
- Demo: Computing Top Salaries
Working with YARN
- Understanding resource management in Hadoop
- Working with ResourceManager, NodeManager, Application Master
- Scheduling jobs under YARN
- Scheduling for large numbers of nodes and clusters
- Demo: Job scheduling
Integrating Hadoop with Spark
- Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
- Understanding Resilient Distributed Datasets (RDDs)
- Creating an RDD
- Implementing RDD Transformations
- Demo: Implementing a Text Search Program for Movie Titles
Managing a Hadoop Cluster
- Monitoring Hadoop
- Securing a Hadoop cluster
- Adding and removing nodes
- Running a performance benchmark
- Tuning a Hadoop cluster to optimizing performance
- Backup, recovery and business continuity planning
- Ensuring high availability (HA)
Upgrading and Migrating a Hadoop Cluster
- Assessing workload requirements
- Upgrading Hadoop
- Moving from on-premise to cloud and vice-versa
- Recovering from failures
Summary and Conclusion
- System administration experience
- Experience with Linux command line
- An understanding of big data concepts
- System administrators
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
I mostly liked the trainer giving real live Examples.
I genuinely enjoyed the big competences of Trainer.
I genuinely enjoyed the many hands-on sessions.
It was very hands-on, we spent half the time actually doing things in Clouded/Hardtop, running different commands, checking the system, and so on. The extra materials (books, websites, etc. .) were really appreciated, we will have to continue to learn. The installations were quite fun, and very handy, the cluster setup from scratch was really good.
Lot of hands-on exercises.
Ambari management tool. Ability to discuss practical Hadoop experiences from other business case than telecom.
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Training topics and engagement of the trainer
- Izba Administracji Skarbowej w Lublinie
Communication with people attending training.
Andrzej Szewczuk - Izba Administracji Skarbowej w Lublinie
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
- Capgemini Polska Sp. z o.o.
usefulness of exercises
- Algomine sp.z.o.o sp.k.
I found the training good, very informative....but could have been spread over 4 or 5 days, allowing us to go into more details on different aspects.
- Veterans Affairs Canada
I really enjoyed the training. Anton has a lot of knowledge and laid out the necessary theory in a very accessible way. It is great that the training was a lot of interesting exercises, so we have been in contact with the technology we know from the very beginning.
Szymon Dybczak - Algomine sp.z.o.o sp.k.
I found this course gave a great overview and quickly touched some areas I wasn't even considering.
- Veterans Affairs Canada
I genuinely liked work exercises with cluster to see performance of nodes across cluster and extended functionality.
The trainers in depth knowledge of the subject
Ajay was a very experienced consultant and was able to answer all our questions and even made suggestions on best practices for the project we are currently engaged on.
That I had it in the first place.
Peter Scales - CACI Ltd
The NIFI workflow excercises
answers to our specific questions
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP21 hours
This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib35 hours
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Alluxio: Unifying Disparate Storage Systems7 hours
Alluxio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba. In this instructor-led,
Apache Ambari: Efficiently Manage Hadoop Clusters21 hours
Apache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to
Big Data Analytics in Health21 hours
Big data analytics involves the process of examining large amounts of varied data sets in order to uncover correlations, hidden patterns, and other useful insights. The health industry has massive amounts of complex heterogeneous medical and
Spark for Developers21 hours
OBJECTIVE: This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark
Apache Spark SQL7 hours
Spark SQL is Apache Spark's module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform
Introduction to Graph Computing28 hours
Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Hortonworks Data Platform (HDP) for Administrators21 hours
Hortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces
Data Analysis with Hive/HiveQL7 hours
This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Impala for Business Intelligence21 hours
Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache
A Practical Introduction to Stream Processing21 hours
Stream Processing refers to the real-time processing of "data in motion", that is, performing computations on data as it is being received. Such data is read as continuous streams from data sources such as sensor events, website user
Magellan: Geospatial Analytics on Spark14 hours
Magellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics. This instructor-led, live
Python and Spark for Big Data (PySpark)21 hours
Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this