- Hadoop history, concepts
- High level architecture
- Hadoop myths
- Hadoop challenges (hardware / software)
- Labs: discuss your Big Data projects and problems
- Planning and installation
- Selecting software, Hadoop distributions
- Sizing the cluster, planning for growth
- Selecting hardware and network
- Rack topology
- Directory structure, logs
- Labs: cluster install, run performance benchmarks
- HDFS operations
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Command-line and browser-based administration
- Adding storage, replacing defective drives
- Labs: getting familiar with HDFS command lines
- Data ingestion
- Flume for logs and other data ingestion into HDFS
- Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
- Hadoop data warehousing with Hive
- Copying data between clusters (distcp)
- Using S3 as complementary to HDFS
- Data ingestion best practices and architectures
- Labs: setting up and using Flume, the same for Sqoop
- MapReduce operations and administration
- Parallel computing before mapreduce: compare HPC vs Hadoop administration
- MapReduce cluster loads
- Nodes and Daemons (JobTracker, TaskTracker)
- MapReduce UI walk through
- Mapreduce configuration
- Job config
- Optimizing MapReduce
- Fool-proofing MR: what to tell your programmers
- Labs: running MapReduce examples
- YARN: new architecture and new capabilities
- YARN design goals and implementation architecture
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling under YARN
- Labs: investigate job scheduling
- Advanced topics
- Hardware monitoring
- Cluster monitoring
- Adding and removing servers, upgrading Hadoop
- Backup, recovery and business continuity planning
- Oozie job workflows
- Hadoop high availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: set up monitoring
- Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
- Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
- comfortable with basic Linux system administration
- basic scripting skills
Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.
Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
I found this course gave a great overview and quickly touched some areas I wasn't even considering.
- Veterans Affairs Canada
I found the training good, very informative....but could have been spread over 4 or 5 days, allowing us to go into more details on different aspects.
- Veterans Affairs Canada
Ambari management tool. Ability to discuss practical Hadoop experiences from other business case than telecom.
Lot of hands-on exercises.
It was very hands-on, we spent half the time actually doing things in Cloudera/Hadoop, running different commands, checking the system, and so on. The extra materials (books, websites, etc...) were really appreciated, we will have to continue to learn. The installations were quite fun, and very handy, the cluster setup from scratch was really good.
Many hands-on sessions.
Big competences of Trainer
Trainer give reallive Examples
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
I like how he was able to elaborate about Nifi and how powerful it is. You can basically use it for any infrastructure and use many different computer languages. Also i was glad we were able to fix the Nifi cert renewal issue we were having with the Truststore.
Joachim Martin - Jacob Jaskolka, BHG Financial
general knowledge and the possibilities that the training offered in terms on the tool.
Nalfis Tobar - Jacob Jaskolka, BHG Financial
The working sessions where we worked on real issues we are trying to solve and built out solutions together.
Jacob Jaskolka, BHG Financial
Isaac Hastings, New Zealand Defence Force
Dwayne McDonald - Isaac Hastings, New Zealand Defence Force
Virtual environment working well and trainer positive attitude
Wojciech Lukawski - Orsted Polska sp. z o.o.
I liked trainer's attitude and choice of examples. Trainer was very willing to help and answer questions. Trainer tried to go with as many examples as possible, even though we were short on time.
Waldemar Sobiecki - Orsted Polska sp. z o.o.
Excersises, working with actual nifi. It was working very well - working on live, virtual machines.
Orsted Polska sp. z o.o.
The trainer is very polite, tolerant, helpful. I was not afraid to ask for help when I couldn't handle something myself. I like that most of the exercises we did together without splitting into separate groups or by ourselves. That way we could ask questions on regular basis.
Elżbieta Doniek - Orsted Polska sp. z o.o.
Very practical, a lot of exercises during the training
Andrii Feshchenko - Orsted Polska sp. z o.o.
Developing a Custom Apache Nifi Processor using JSON provided a useful demostration of how we can use NiFi to transform data before forwarding to our analytical tools.
Kenny MacLeod - MOD A BLOCK
James answered my every question, was extremely patient and explained me everything. NIFI was Greek and latin for me and i have learnt what was a processor, flunnel and the root process from Beginner level to Advanced Level.
Firdous Hashim Ali - MOD A BLOCK
That I had it in the first place.
Peter Scales - CACI Ltd
Work exercises with cluster to see performance of nodes across cluster and extended functionality
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
The trainer was always open for questions and willing to answer and explain everything. He seems to have very good and deep knowledge of what he is teaching. We were able to focus more on topics that might bring value for us since we were only two students.
DEVK Deutsche Eisenbahn Versicherung Sach- und HUK-Versicherungsverein a.G.
- The trainer is open to questions, and the training is in interactive way. I like this point. - The trainer was able to efficiently manage the participation of remote persons who weren't able to be present in the office.
Arnaud CAPITAINE, Adikteev
It was interesting, got the chance to learn more about machine learning and Spark stack of technologies.
Edina Kiss, Accenture Industrial SS
The fact that we were able to take with us most of the information/course/presentation/exercises done, so that we can look over them and perhaps redo what we didint understand first time or improve what we already did.
Raul Mihail Rat - Edina Kiss, Accenture Industrial SS
I liked that it managed to lay the foundations of the topic and go to some quite advanced exercises. Also provided easy ways to write/test the code.
Ionut Goga - Edina Kiss, Accenture Industrial SS
The live examples