Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Overview of Python and Scala

Foundational Concepts (Theory):

  • Architecture
  • RDD
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Mastering the Basics via Databricks (Hands-on Workshop):

  • Exercises utilizing the RDD API
  • Basic action and transformation functions
  • PairRDDs
  • Joins
  • Caching strategies
  • Exercises utilizing the DataFrame API
  • SparkSQL
  • DataFrame operations: select, filter, group, sort
  • UDFs (User Defined Functions)
  • Introduction to the DataSet API
  • Streaming

Mastering Cloud Deployment via AWS (Hands-on Workshop):

  • Fundamentals of AWS Glue
  • Comparing AWS EMR and AWS Glue
  • Example jobs run on both environments
  • Advantages and disadvantages of each platform

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Testimonials (3)

Upcoming Courses

Related Categories