Course Outline


Scala Programming in Depth Review

  • Syntax and structure
  • Flow control and functions

Spark Internals

  • Resilient Distributed Datasets (RDD)
  • Spark script to graph to cluster

Overview of Spark Streaming

  • Streaming architecture
  • Intervals in streaming
  • Fault tolerance

Preparing the Development Environment

  • Installing and configuring Apache Spark
  • Installing and configuring the Scala IDE
  • Installing and configuring JDK

Spark Streaming Beginner to Advanced

  • Working with key/value RDD's
  • Filtering RDD's
  • Improving Spark scripts with regular expressions
  • Sharing data on a cluster
  • Working with network data sets
  • Implementing BFS algorithms
  • Creating Spark driver scripts
  • Tracking in real time with scripts
  • Writing continuous applications
  • Streaming linear regression
  • Using Spark Machine Learning Library

Spark and Clusters

  • Bundling dependencies and Spark scripts using the SBT tool
  • Using EMR for illustrating clusters
  • Optimizing by partitioning RDD's
  • Using Spark logs

Integration in Spark Streaming

  • Integrating Apache Kafka and working with Kafka topics
  • Integrating Apache Fume and working with pull-based/push-based Flume configurations
  • Writing a custom receiver class
  • Integrating Cassandra and exposing data as real-time services

In Production

  • Packaging an application and running it with Spark-Submit
  • Troubleshooting, tuning, and debugging Spark Jobs and clusters

Summary and Conclusion


  • Programming and scripting experience


  • Software Engineers
  21 Hours


Related Courses

Python and Spark for Big Data (PySpark)

  21 hours

Introduction to Graph Computing

  28 hours

Apache Spark MLlib

  35 hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

  21 hours

Programming in Scala

  14 hours

Machine Learning Fundamentals with Scala and Apache Spark

  14 hours

Scala: Advanced Object-Functional Programming

  14 hours

Scala: Advanced Functional Programming

  14 hours

Property Based Testing with ScalaCheck

  21 hours

Akka - from Beginner to Intermediate

  21 hours

Spark for Developers

  21 hours

Hortonworks Data Platform (HDP) for Administrators

  21 hours

Magellan: Geospatial Analytics on Spark

  14 hours

Alluxio: Unifying Disparate Storage Systems

  7 hours

Apache Spark SQL

  7 hours