Course Outline
Introduction
- Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink
Installing and Configuring Apache Beam
Overview of Apache Beam Features and Architecture
- Beam Model, SDKs, Beam Pipeline Runners
- Distributed processing back-ends
Understanding the Apache Beam Programming Model
- How a pipeline is executed
Running a sample pipeline
- Preparing a WordCount pipeline
- Executing the Pipeline locally
Designing a Pipeline
- Planning the structure, choosing the transforms, and determining the input and output methods
Creating the Pipeline
- Writing the driver program and defining the pipeline
- Using Apache Beam classes
- Data sets, transforms, I/O, data encoding, etc.
Executing the Pipeline
- Executing the pipeline locally, on remote machines, and on a public cloud
- Choosing a runner
- Runner-specific configurations
Testing and Debugging Apache Beam
- Using type hints to emulate static typing
- Managing Python Pipeline Dependencies
Processing Bounded and Unbounded Datasets
- Windowing and Triggers
Making Your Pipelines Reusable and Maintainable
Create New Data Sources and Sinks
- Apache Beam Source and Sink API
Integrating Apache Beam with other Big Data Systems
- Apache Hadoop, Apache Spark, Apache Kafka
Troubleshooting
Summary and Conclusion
Requirements
- Experience with Python Programming.
- Experience with the Linux command line.
Audience
- Developers
Testimonials
Recalling/reviewing keypoints of the topics discussed.
Paolo Angelo Gaton - SMS Global Technologies Inc.
Sufficient hands on, trainer is knowledgable
Chris Tan
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.