Course Outline
Introduction to Hortonworks Data Platform (HDP)
Overview of Big Data and Apache Hadoop
Installing and Configuring HDP
Setting up, Deploying, and Managing Hadoop Cluster
Understanding and ConfiguringYARN and MapReduce
Overview of Job Scheduling
Ensuring Data Integrity
Understanding Enterprise Data Movement
Using HDFS Commands & Services
Transferring Data Using Flume
Working with Hive
Scheduling Workflow Using Oozie
Exploring Hadoop 2.x
Understanding Hbase Architecture
Monitoring HDP2 Services Using Ambari
New Features in HDP
Troubleshooting
Summary and Conclusion
Requirements
- An understanding of Hadoop and big data.
- An understanding of Spark.
- Familiarity with the command line.
- System administration experience.
Audience
- Hadoop administrators
Testimonials
Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.
Kieran Mac Kenna
share concept diagram and also sample for hands dirty
Mark Yang - FMR
Applicable scenarios and cases
zhaopeng liu - Fmr
case analysis
国栋 张
all parts of this session
Eric Han - Fmr
We know a lot more about the whole environment.
John Kidd
The trainer made the class interesting and entertaining which helps quite a bit with all day training.
Ryan Speelman
I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.
Ernesto did a great job explaining the high level concepts of using Spark and its various modules.
Michael Nemerouf
Richard was very willing to digress when we wanted to ask semi-related questions about things not on the syllabus. Explanations were clear and he was up front about caveats in any advice he gave us.
- ARM Limited
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Small group (4 trainees) and we could progress together. Also the trainer could so help everybody.
- ICE International Copyright Enterprise Germany GmbH
Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.
Biniam Guulay - ICE International Copyright Enterprise Germany GmbH
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.
Thomas Carcaud - IT Frankfurt GmbH
get to learn spark streaming , databricks and aws redshift
Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.
The content and the knowledge .
Jobstreet.com Shared Services Sdn. Bhd.
It was very informative. I've had very little experience with Spark before and so far this course has provided a very good introduction to the subject.
Intelligent Medical Objects
It was great to get an understanding of what is going on under the hood of Spark. Knowing what's going on under the hood helps to better understand why your code is or is not doing what you expect it to do. A lot of the training was hands on which is always great and the section on optimizations was exceptionally relevant to my current work which was nice.
Intelligent Medical Objects
This is a great class! I most appreciate that Andras explains very clearly what Spark is all about, where it came from, and what problems it is able to solve. Much better than other introductions I've seen that just dive into how to use it. Andras has a deep knowledge of the topic and explains things very well.
Intelligent Medical Objects
The live examples that were given and showed the basic aspects of Spark.
Intelligent Medical Objects
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Trainer adjusted the training slightly based on audience request , so throw some light on few diff topics that we have requested
Intelligent Medical Objects
His pace, was great. I loved the fact he went into theory too so that I understand WHY i would do the things he is asking.
Intelligent Medical Objects
I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Related Courses
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 hoursThis course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib
35 hoursMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Alluxio: Unifying Disparate Storage Systems
7 hoursAlluxio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba. In this instructor-led,
Big Data Analytics in Health
21 hoursBig data analytics involves the process of examining large amounts of varied data sets in order to uncover correlations, hidden patterns, and other useful insights. The health industry has massive amounts of complex heterogeneous medical and
Apache Spark for .NET Developers
21 hoursApache Spark is a distributed processing engine for analyzing very large data sets. It can process data in batches and real-time, as well as carry out machine learning, ad-hoc queries, and graph processing. .NET for Apache Spark is a free,
Apache Spark Fundamentals
21 hoursApache Spark is an analytics engine designed to distribute data across a cluster in order to process it in parallel. It contains modules for streaming, SQL, machine learning and graph processing. This instructor-led, live training (online or
Apache Spark in the Cloud
21 hoursApache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the
Spark for Developers
21 hoursOBJECTIVE: This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark
Apache Spark SQL
7 hoursSpark SQL is Apache Spark's module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform
Introduction to Graph Computing
28 hoursMany real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
A Practical Introduction to Stream Processing
21 hoursStream Processing refers to the real-time processing of "data in motion", that is, performing computations on data as it is being received. Such data is read as continuous streams from data sources such as sensor events, website user
Magellan: Geospatial Analytics on Spark
14 hoursMagellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics. This instructor-led, live
SMACK Stack for Data Science
14 hoursSMACK is a collection of data platform softwares, namely Apache Spark, Apache Mesos, Apache Akka, Apache Cassandra, and Apache Kafka. Using the SMACK stack, users can create and scale data processing platforms. This instructor-led, live training
Python and Spark for Big Data (PySpark)
21 hoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this
Apache Spark Streaming with Scala
21 hoursScala is a condensed version of Java for large scale functional and object-oriented programming. Apache Spark Streaming is an extended component of the Spark API for processing big data sets as real-time streams. Together, Spark Streaming and