- Overview of Spark and Hadoop features and architecture
- Understanding big data
- Python programming basics
- Setting up Python, Spark, and Hadoop
- Understanding data structures in Python
- Understanding PySpark API
- Understanding HDFS and MapReduce
Integrating Spark and Hadoop with Python
- Implementing Spark RDD in Python
- Processing data using MapReduce
- Creating distributed datasets in HDFS
Machine Learning with Spark MLlib
Processing Big Data with Spark Streaming
Working with Recommender Systems
Working with Kafka, Sqoop, Kafka, and Flume
Apache Mahout with Spark and Hadoop
Summary and Next Steps
- Experience with Spark and Hadoop
- Python programming experience
- Data scientists
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
I mostly liked the trainer giving real live Examples.
I genuinely enjoyed the big competences of Trainer.
I genuinely enjoyed the many hands-on sessions.
It was very hands-on, we spent half the time actually doing things in Clouded/Hardtop, running different commands, checking the system, and so on. The extra materials (books, websites, etc. .) were really appreciated, we will have to continue to learn. The installations were quite fun, and very handy, the cluster setup from scratch was really good.
Lot of hands-on exercises.
Ambari management tool. Ability to discuss practical Hadoop experiences from other business case than telecom.
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Training topics and engagement of the trainer
- Izba Administracji Skarbowej w Lublinie
Communication with people attending training.
Andrzej Szewczuk - Izba Administracji Skarbowej w Lublinie
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
- Capgemini Polska Sp. z o.o.
usefulness of exercises
- Algomine sp.z.o.o sp.k.
I found the training good, very informative....but could have been spread over 4 or 5 days, allowing us to go into more details on different aspects.
- Veterans Affairs Canada
I really enjoyed the training. Anton has a lot of knowledge and laid out the necessary theory in a very accessible way. It is great that the training was a lot of interesting exercises, so we have been in contact with the technology we know from the very beginning.
Szymon Dybczak - Algomine sp.z.o.o sp.k.
I found this course gave a great overview and quickly touched some areas I wasn't even considering.
- Veterans Affairs Canada
I genuinely liked work exercises with cluster to see performance of nodes across cluster and extended functionality.
The trainers in depth knowledge of the subject
Ajay was a very experienced consultant and was able to answer all our questions and even made suggestions on best practices for the project we are currently engaged on.
That I had it in the first place.
Peter Scales - CACI Ltd
The NIFI workflow excercises
answers to our specific questions
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP21 hours
This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib35 hours
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Scaling Data Analysis with Python and Dask14 hours
Dask is a flexible and high-performance Python library for parallel computing. It scales and accelerates big data processing with other Python-based data science libraries, such as Pandas, Numpy, and Scikit-Learn. This instructor-led, live
Data Analysis with Python, Pandas, and Numpy14 hours
Pandas is a Python package that provides data structures for working with structured (tabular, multidimensional, potentially heterogeneous) and time series data.
Accelerating Python Pandas Workflows with Modin14 hours
Modin is a parallel data frame system designed to speed up Pandas workflows. It can be used to handle large datasets, leveraging Ray or Dask as the backend framework for distributed computing in Python. This instructor-led, live training (online
Machine Learning with Python and Pandas14 hours
Pandas is a Python library for data manipulation and analysis. Using Pandas, users can perform predictive analysis through machine learning. This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use Pandas
FARM (FastAPI, React, and MongoDB) Full Stack Development14 hours
FARM (FastAPI, React, and MongoDB) is similar to MERN, but performs faster with Python and FastAPI replacing Node.js and Express as the backend. FastAPI is a high-performance Python web framework used by top companies, such as Microsoft, Uber, and
Developing APIs with Python and FastAPI14 hours
FastAPI is an open source, high-performance web framework for building APIs with Python. It is used by many large companies, such as Uber, Netflix, and Microsoft. This instructor-led, live training (online or onsite) is aimed at developers who
Web application development with Flask14 hours
This practical course is addressed to Python developers that want to create and maintain their first web applications. It is also addressed to people who are already familiar with other web frameworks such as Django or Web2py, and want to learn
Advanced Flask14 hours
Flask is a micro-framework for developing web applications in Python. Unlike other frameworks, Flask does not have any dependencies on external libraries, making it lightweight and fast. This instructor-led, live training (online or onsite) is
Build REST APIs with Python and Flask14 hours
Flask is a micro-framework for developing web services in Python. Flask, unlike other frameworks, does not have any dependencies on external libraries, making it lightweight and fast. This instructor-led, live training (online or onsite) is aimed
Introduction to Graph Computing28 hours
Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Game Development with PyGame7 hours
PyGame is an open source library of Python modules for developing game applications and programs. It is lightweight, easy to use, and compatible with any operating system or platform. This instructor-led, live training (online or onsite) is aimed
Scientific Computing with Python SciPy7 hours
SciPy is an open source Python library for scientific, mathematical, and technical computing. It is built on the NumPy extension, providing a wide range of functionalities for performing complex numerical operations. This instructor-led, live
Python and Spark for Big Data (PySpark)21 hours
Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this