Unified Batch and Stream Processing with Apache Beam Training Course

Apache Beam is an open-source, unified programming model designed for defining and executing parallel data processing pipelines. Its strength lies in its capability to handle both batch and streaming pipelines, with execution supported by various distributed processing back-ends such as Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is particularly useful for ETL tasks, including moving data between different storage systems and sources, transforming it into a more suitable format, and loading it into new systems.

In this instructor-led training session (held either on-site or remotely), participants will learn how to integrate the Apache Beam SDKs within a Java or Python application to define a data processing pipeline that breaks down large datasets into smaller segments for independent and parallel processing.

By the end of this training, participants will be able to:

Install and configure Apache Beam.
Utilize a single programming model within their Java or Python application to perform both batch and stream processing.
Run pipelines across multiple environments.

Course Format

A combination of lectures, discussions, exercises, and extensive hands-on practice.

Note

This course will be offered in Scala in the future. Please contact us to arrange.

This course is available as onsite live training in United Arab Emirates or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink

Installing and Configuring Apache Beam

Overview of Apache Beam Features and Architecture

Beam Model, SDKs, Beam Pipeline Runners
Distributed processing back-ends

Understanding the Apache Beam Programming Model

How a pipeline is executed

Running a sample pipeline

Preparing a WordCount pipeline
Executing the Pipeline locally

Designing a Pipeline

Planning the structure, choosing the transforms, and determining the input and output methods

Creating the Pipeline

Writing the driver program and defining the pipeline
Using Apache Beam classes
Data sets, transforms, I/O, data encoding, etc.

Executing the Pipeline

Executing the pipeline locally, on remote machines, and on a public cloud
Choosing a runner
Runner-specific configurations

Testing and Debugging Apache Beam

Using type hints to emulate static typing
Managing Python Pipeline Dependencies

Processing Bounded and Unbounded Datasets

Windowing and Triggers

Making Your Pipelines Reusable and Maintainable

Create New Data Sources and Sinks

Apache Beam Source and Sink API

Integrating Apache Beam with other Big Data Systems

Apache Hadoop, Apache Spark, Apache Kafka

Troubleshooting

Summary and Conclusion

Requirements

Experience with Python Programming.
Experience with the Linux command line.

Audience

Developers

14 Hours

Need help picking the right course?

Unified Batch and Stream Processing with Apache Beam Training Course

Course Outline

Requirements

Upcoming Courses

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites