Course Outline


  • Overview of Spark and Hadoop features and architecture
  • Understanding big data
  • Python programming basics

Getting Started

  • Setting up Python, Spark, and Hadoop
  • Understanding data structures in Python
  • Understanding PySpark API
  • Understanding HDFS and MapReduce

Integrating Spark and Hadoop with Python

  • Implementing Spark RDD in Python
  • Processing data using MapReduce
  • Creating distributed datasets in HDFS

Machine Learning with Spark MLlib

Processing Big Data with Spark Streaming

Working with Recommender Systems

Working with Kafka, Sqoop, Kafka, and Flume

Apache Mahout with Spark and Hadoop


Summary and Next Steps


  • Experience with Spark and Hadoop
  • Python programming experience


  • Data scientists
  • Developers
  21 Hours


Related Courses

Python and Spark for Big Data (PySpark)

  21 hours

Introduction to Graph Computing

  28 hours

Apache Spark MLlib

  35 hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

  21 hours

Data Analysis with Python, Pandas, and Numpy

  14 hours

Machine Learning with Python and Pandas

  14 hours

Accelerating Python Pandas Workflows with Modin

  14 hours

Scaling Data Analysis with Python and Dask

  14 hours

Developing APIs with Python and FastAPI

  14 hours

FARM (FastAPI, React, and MongoDB) Full Stack Development

  14 hours

Scientific Computing with Python SciPy

  7 hours

Game Development with PyGame

  7 hours

Web application development with Flask

  14 hours

Build REST APIs with Python and Flask

  14 hours

Advanced Flask

  14 hours