Course Outline
To request a customized course outline for this training, please contact us.
Requirements
- An understanding of big data concepts
- Experience with Hadoop, Spark and other similar systems
- System administration experience
Testimonials
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
vyzVoice
I mostly liked the trainer giving real live Examples.
Simon Hahn
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczątka
It was very hands-on, we spent half the time actually doing things in Clouded/Hardtop, running different commands, checking the system, and so on. The extra materials (books, websites, etc. .) were really appreciated, we will have to continue to learn. The installations were quite fun, and very handy, the cluster setup from scratch was really good.
Ericsson
Lot of hands-on exercises.
- Ericsson
Ambari management tool. Ability to discuss practical Hadoop experiences from other business case than telecom.
- Ericsson
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Training topics and engagement of the trainer
- Izba Administracji Skarbowej w Lublinie
Communication with people attending training.
Andrzej Szewczuk - Izba Administracji Skarbowej w Lublinie
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Exercises
- Capgemini Polska Sp. z o.o.
usefulness of exercises
- Algomine sp.z.o.o sp.k.
I found the training good, very informative....but could have been spread over 4 or 5 days, allowing us to go into more details on different aspects.
- Veterans Affairs Canada
I really enjoyed the training. Anton has a lot of knowledge and laid out the necessary theory in a very accessible way. It is great that the training was a lot of interesting exercises, so we have been in contact with the technology we know from the very beginning.
Szymon Dybczak - Algomine sp.z.o.o sp.k.
I found this course gave a great overview and quickly touched some areas I wasn't even considering.
- Veterans Affairs Canada
I genuinely liked work exercises with cluster to see performance of nodes across cluster and extended functionality.
CACI Ltd
The trainers in depth knowledge of the subject
CACI Ltd
Ajay was a very experienced consultant and was able to answer all our questions and even made suggestions on best practices for the project we are currently engaged on.
CACI Ltd
That I had it in the first place.
Peter Scales - CACI Ltd
The NIFI workflow excercises
Politiets Sikkerhetstjeneste
answers to our specific questions
MOD BELGIUM
Related Courses
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 hoursThis course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib
35 hoursMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Apache Ambari: Efficiently Manage Hadoop Clusters
21 hoursApache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to
Big Data Analytics in Health
21 hoursBig data analytics involves the process of examining large amounts of varied data sets in order to uncover correlations, hidden patterns, and other useful insights. The health industry has massive amounts of complex heterogeneous medical and
Apache Spark in the Cloud
21 hoursApache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the
Spark for Developers
21 hoursOBJECTIVE: This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark
Apache Spark SQL
7 hoursSpark SQL is Apache Spark's module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform
Introduction to Graph Computing
28 hoursMany real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Hortonworks Data Platform (HDP) for Administrators
21 hoursHortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces
Data Analysis with Hive/HiveQL
7 hoursThis course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Impala for Business Intelligence
21 hoursCloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache
A Practical Introduction to Stream Processing
21 hoursStream Processing refers to the real-time processing of "data in motion", that is, performing computations on data as it is being received. Such data is read as continuous streams from data sources such as sensor events, website user
Magellan: Geospatial Analytics on Spark
14 hoursMagellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics. This instructor-led, live
Python and Spark for Big Data (PySpark)
21 hoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this