Course Outline


  • Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink

Installing and Configuring Apache Beam

Overview of Apache Beam Features and Architecture

  • Beam Model, SDKs, Beam Pipeline Runners
  • Distributed processing back-ends

Understanding the Apache Beam Programming Model

  • How a pipeline is executed

Running a sample pipeline

  • Preparing a WordCount pipeline
  • Executing the Pipeline locally

Designing a Pipeline

  • Planning the structure, choosing the transforms, and determining the input and output methods

Creating the Pipeline

  • Writing the driver program and defining the pipeline
  • Using Apache Beam classes
  • Data sets, transforms, I/O, data encoding, etc.

Executing the Pipeline

  • Executing the pipeline locally, on remote machines, and on a public cloud
  • Choosing a runner
  • Runner-specific configurations

Testing and Debugging Apache Beam

  • Using type hints to emulate static typing
  • Managing Python Pipeline Dependencies

Processing Bounded and Unbounded Datasets

  • Windowing and Triggers

Making Your Pipelines Reusable and Maintainable

Create New Data Sources and Sinks

  • Apache Beam Source and Sink API

Integrating Apache Beam with other Big Data Systems

  • Apache Hadoop, Apache Spark, Apache Kafka


Summary and Conclusion


  • Experience with Python Programming.
  • Experience with the Linux command line.


  • Developers
  14 Hours


Related Courses

Samza for Stream Processing

  14 hours

Tigon: Real-time Streaming for the Real World

  14 hours

Real-Time Stream Processing with MapR

  7 hours

Stream Processing with Kafka Streams

  7 hours

A Practical Introduction to Stream Processing

  21 hours

Building Kafka Solutions with Confluent

  14 hours

Apache Kafka for Python Programmers

  7 hours

Apache Flink Fundamentals

  28 hours

Apache NiFi for Administrators

  21 hours

Apache NiFi for Developers

  7 hours

Apache Storm

  28 hours

Apache Apex: Processing Big Data-in-Motion

  21 hours

Apache Ignite for Developers

  14 hours

Confluent KSQL

  7 hours

Spark Streaming with Python and Kafka

  7 hours