Course Outline
Introduction
- Overview of Spark and Hadoop features and architecture
- Understanding big data
- Python programming basics
Getting Started
- Setting up Python, Spark, and Hadoop
- Understanding data structures in Python
- Understanding PySpark API
- Understanding HDFS and MapReduce
Integrating Spark and Hadoop with Python
- Implementing Spark RDD in Python
- Processing data using MapReduce
- Creating distributed datasets in HDFS
Machine Learning with Spark MLlib
Processing Big Data with Spark Streaming
Working with Recommender Systems
Working with Kafka, Sqoop, Kafka, and Flume
Apache Mahout with Spark and Hadoop
Troubleshooting
Summary and Next Steps
Requirements
- Experience with Spark and Hadoop
- Python programming experience
Audience
- Data scientists
- Developers
Testimonials
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
vyzVoice
I mostly liked the trainer giving real live Examples.
Simon Hahn
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczątka
It was very hands-on, we spent half the time actually doing things in Clouded/Hardtop, running different commands, checking the system, and so on. The extra materials (books, websites, etc. .) were really appreciated, we will have to continue to learn. The installations were quite fun, and very handy, the cluster setup from scratch was really good.
Ericsson
Lot of hands-on exercises.
- Ericsson
Ambari management tool. Ability to discuss practical Hadoop experiences from other business case than telecom.
- Ericsson
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Training topics and engagement of the trainer
- Izba Administracji Skarbowej w Lublinie
Communication with people attending training.
Andrzej Szewczuk - Izba Administracji Skarbowej w Lublinie
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Exercises
- Capgemini Polska Sp. z o.o.
usefulness of exercises
- Algomine sp.z.o.o sp.k.
I found the training good, very informative....but could have been spread over 4 or 5 days, allowing us to go into more details on different aspects.
- Veterans Affairs Canada
I really enjoyed the training. Anton has a lot of knowledge and laid out the necessary theory in a very accessible way. It is great that the training was a lot of interesting exercises, so we have been in contact with the technology we know from the very beginning.
Szymon Dybczak - Algomine sp.z.o.o sp.k.
I found this course gave a great overview and quickly touched some areas I wasn't even considering.
- Veterans Affairs Canada
I genuinely liked work exercises with cluster to see performance of nodes across cluster and extended functionality.
CACI Ltd
The trainers in depth knowledge of the subject
CACI Ltd
Ajay was a very experienced consultant and was able to answer all our questions and even made suggestions on best practices for the project we are currently engaged on.
CACI Ltd
That I had it in the first place.
Peter Scales - CACI Ltd
The NIFI workflow excercises
Politiets Sikkerhetstjeneste
answers to our specific questions
MOD BELGIUM
Related Courses
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 hoursThis course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib
35 hoursMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Scaling Data Analysis with Python and Dask
14 hoursDask is a flexible and high-performance Python library for parallel computing. It scales and accelerates big data processing with other Python-based data science libraries, such as Pandas, Numpy, and Scikit-Learn. This instructor-led, live
Data Analysis with Python, Pandas, and Numpy
14 hoursPandas is a Python package that provides data structures for working with structured (tabular, multidimensional, potentially heterogeneous) and time series data.
Accelerating Python Pandas Workflows with Modin
14 hoursModin is a parallel data frame system designed to speed up Pandas workflows. It can be used to handle large datasets, leveraging Ray or Dask as the backend framework for distributed computing in Python. This instructor-led, live training (online
Machine Learning with Python and Pandas
14 hoursPandas is a Python library for data manipulation and analysis. Using Pandas, users can perform predictive analysis through machine learning. This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use Pandas
FARM (FastAPI, React, and MongoDB) Full Stack Development
14 hoursFARM (FastAPI, React, and MongoDB) is similar to MERN, but performs faster with Python and FastAPI replacing Node.js and Express as the backend. FastAPI is a high-performance Python web framework used by top companies, such as Microsoft, Uber, and
Developing APIs with Python and FastAPI
14 hoursFastAPI is an open source, high-performance web framework for building APIs with Python. It is used by many large companies, such as Uber, Netflix, and Microsoft. This instructor-led, live training (online or onsite) is aimed at developers who
Web application development with Flask
14 hoursThis practical course is addressed to Python developers that want to create and maintain their first web applications. It is also addressed to people who are already familiar with other web frameworks such as Django or Web2py, and want to learn
Advanced Flask
14 hoursFlask is a micro-framework for developing web applications in Python. Unlike other frameworks, Flask does not have any dependencies on external libraries, making it lightweight and fast. This instructor-led, live training (online or onsite) is
Build REST APIs with Python and Flask
14 hoursFlask is a micro-framework for developing web services in Python. Flask, unlike other frameworks, does not have any dependencies on external libraries, making it lightweight and fast. This instructor-led, live training (online or onsite) is aimed
Introduction to Graph Computing
28 hoursMany real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Game Development with PyGame
7 hoursPyGame is an open source library of Python modules for developing game applications and programs. It is lightweight, easy to use, and compatible with any operating system or platform. This instructor-led, live training (online or onsite) is aimed
Scientific Computing with Python SciPy
7 hoursSciPy is an open source Python library for scientific, mathematical, and technical computing. It is built on the NumPy extension, providing a wide range of functionalities for performing complex numerical operations. This instructor-led, live
Python and Spark for Big Data (PySpark)
21 hoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this