Course Outline
Introduction
- Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink
Installing and Configuring Apache Beam
Overview of Apache Beam Features and Architecture
- Beam Model, SDKs, Beam Pipeline Runners
- Distributed processing back-ends
Understanding the Apache Beam Programming Model
- How a pipeline is executed
Running a sample pipeline
- Preparing a WordCount pipeline
- Executing the Pipeline locally
Designing a Pipeline
- Planning the structure, choosing the transforms, and determining the input and output methods
Creating the Pipeline
- Writing the driver program and defining the pipeline
- Using Apache Beam classes
- Data sets, transforms, I/O, data encoding, etc.
Executing the Pipeline
- Executing the pipeline locally, on remote machines, and on a public cloud
- Choosing a runner
- Runner-specific configurations
Testing and Debugging Apache Beam
- Using type hints to emulate static typing
- Managing Python Pipeline Dependencies
Processing Bounded and Unbounded Datasets
- Windowing and Triggers
Making Your Pipelines Reusable and Maintainable
Create New Data Sources and Sinks
- Apache Beam Source and Sink API
Integrating Apache Beam with other Big Data Systems
- Apache Hadoop, Apache Spark, Apache Kafka
Troubleshooting
Summary and Conclusion
Requirements
- Experience with Python Programming.
- Experience with the Linux command line.
Audience
- Developers
Testimonials
Recalling/reviewing keypoints of the topics discussed.
Paolo Angelo Gaton - SMS Global Technologies Inc.
-
Roxane Santiago - SMS Global Technologies Inc.
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Related Courses
Apache Ignite for Developers
14 hoursApache Ignite is an in-memory computing platform that sits between the application and data layer to improve speed, scale, and availability. In this instructor-led, live training, participants will learn the principles behind persistent and pure
Apache Apex: Processing Big Data-in-Motion
21 hoursApache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable. This instructor-led, live
Building Kafka Solutions with Confluent
14 hoursThis instructor-led, live training (online or onsite) is aimed at engineers who wish to use Confluent (a distribution of Kafka) to build and manage a real-time data processing platform for their applications. By the end of this training,
A Practical Introduction to Stream Processing
21 hoursStream Processing refers to the real-time processing of "data in motion", that is, performing computations on data as it is being received. Such data is read as continuous streams from data sources such as sensor events, website user
Apache Kafka for Python Programmers
7 hoursApache Kafka is an open-source stream-processing platform that provides a fast, reliable, and low-latency platform for handling real-time data analytics. Apache Kafka can be integrated with available programming languages such as Python. This
Stream Processing with Kafka Streams
7 hoursKafka Streams is a client-side library for building applications and microservices whose data is passed to and from a Kafka messaging system. Traditionally, Apache Kafka has relied on Apache Spark or Apache Storm to process data between message
Real-Time Stream Processing with MapR
7 hoursIn this instructor-led, live training, participants will learn the core concepts behind MapR Stream Architecture as they develop a real-time streaming application. By the end of this training, participants will be able to build producer and
Samza for Stream Processing
14 hoursApache Samza is an open-source near-realtime, asynchronous computational framework for stream processing. It uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource
Tigon: Real-time Streaming for the Real World
14 hoursTigon is an open-source, real-time, low-latency, high-throughput, native YARN, stream processing framework that sits on top of HDFS and HBase for persistence. Tigon applications address use cases such as network intrusion detection and analytics,
Apache Flink Fundamentals
28 hoursApache Flink is an open-source framework for scalable stream and batch data processing. This instructor-led, live training introduces the principles and approaches behind distributed stream and batch data processing, and walks participants
Confluent KSQL
7 hoursConfluent KSQL is a stream processing framework built on top of Apache Kafka. It enables real-time data processing using SQL operations. This instructor-led, live training (online or onsite) is aimed at developers who wish to implement Apache
Apache NiFi for Administrators
21 hoursApache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a
Apache NiFi for Developers
7 hoursApache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a
Spark Streaming with Python and Kafka
7 hoursApache Spark Streaming is a scalable, open source stream processing system that allows users to process real-time data from supported sources. Spark Streaming enables fault-tolerant processing of data streams. This instructor-led, live
Apache Storm
28 hoursApache Storm is a distributed, real-time computation engine used for enabling real-time business intelligence. It does so by enabling applications to reliably process unbounded streams of data (a.k.a. stream processing). "Storm is for