A Practical Introduction to Stream Processing Training Course
Stream Processing involves the real-time analysis of "data in motion," which means performing computations on data as it is being received. This data comes from continuous streams such as sensor events, user activity on websites, financial transactions, credit card swipes, and click streams. Stream Processing frameworks can handle large volumes of incoming data and provide immediate insights.
In this instructor-led training session (held either onsite or remotely), participants will learn how to set up and integrate various Stream Processing frameworks with existing big data storage systems and related software applications and microservices.
By the end of this training, participants will be able to:
- Install and configure different Stream Processing frameworks like Spark Streaming and Kafka Streaming.
- Select and understand the most suitable framework for their specific needs.
- Process data continuously, concurrently, and record by record.
- Integrate Stream Processing solutions with existing databases, data warehouses, and data lakes.
- Integrate the most appropriate stream processing library with enterprise applications and microservices.
Audience
- Developers
- Software architects
Course Format
- The course includes lectures, discussions, exercises, and extensive hands-on practice.
Notes
- If you require a customized training for this course, please contact us to arrange the details.
Course Outline
Introduction
- Stream processing vs batch processing
- Analytics-focused stream processing
Overview Frameworks and Programming Languages
- Spark Streaming (Scala)
- Kafka Streaming (Java)
- Flink
- Storm
- Comparison of Features and Strengths of Each Framework
Overview of Data Sources
- Live data as a series of events over time
- Historical data sources
Deployment Options
- In the cloud (AWS, etc.)
- On premise (private cloud, etc.)
Getting Started
- Setting up the Development Environment
- Installing and Configuring
- Assessing Your Data Analysis Needs
Operating a Streaming Framework
- Integrating the Streaming Framework with Big Data Tools
- Event Stream Processing (ESP) vs Complex Event Processing (CEP)
- Transforming the Input Data
- Inspecting the Output Data
- Integrating the Stream Processing Framework with Existing Applications and Microservices
Troubleshooting
Summary and Conclusion
Requirements
- Programming experience in any language
- An understanding of Big Data concepts (Hadoop, etc.)
Need help picking the right course?
A Practical Introduction to Stream Processing Training Course - Enquiry
Testimonials (1)
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
Upcoming Courses
Related Courses
Administration of Confluent Apache Kafka
21 HoursConfluent Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant data pipelines and real-time analytics.
This instructor-led, live training (online or onsite) is aimed at intermediate-level system administrators and DevOps professionals who wish to install, configure, monitor, and troubleshoot Confluent Apache Kafka clusters.
By the end of this training, participants will be able to:
- Understand the components and architecture of Confluent Kafka.
- Deploy and manage Kafka brokers, Zookeeper quorums, and key services.
- Configure advanced features including security, replication, and performance tuning.
- Use management tools to monitor and maintain Kafka clusters.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 HoursThis course targets developers and data scientists interested in integrating AI into their applications. It places particular emphasis on Data Analysis, Distributed AI, and Natural Language Processing.
Unified Batch and Stream Processing with Apache Beam
14 HoursApache Beam is an open-source, unified programming model designed for defining and executing parallel data processing pipelines. Its strength lies in its capability to handle both batch and streaming pipelines, with execution supported by various distributed processing back-ends such as Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is particularly useful for ETL tasks, including moving data between different storage systems and sources, transforming it into a more suitable format, and loading it into new systems.
In this instructor-led training session (held either on-site or remotely), participants will learn how to integrate the Apache Beam SDKs within a Java or Python application to define a data processing pipeline that breaks down large datasets into smaller segments for independent and parallel processing.
By the end of this training, participants will be able to:
- Install and configure Apache Beam.
- Utilize a single programming model within their Java or Python application to perform both batch and stream processing.
- Run pipelines across multiple environments.
Course Format
- A combination of lectures, discussions, exercises, and extensive hands-on practice.
Note
- This course will be offered in Scala in the future. Please contact us to arrange.
Building Kafka Solutions with Confluent
14 HoursThis instructor-led, live training (delivered online or at your site) is designed for engineers who want to leverage Confluent (a Kafka distribution) to develop and manage a real-time data processing platform for their applications.
By the conclusion of this course, participants will be able to:
- Set up and configure the Confluent Platform.
- Utilize Confluent's management tools and services to simplify Kafka operations.
- Store and process incoming data streams.
- Optimize and manage Kafka clusters effectively.
- Secure their data streams.
Course Format
- Engaging lectures and discussions.
- Ample exercises and practice sessions.
- Practical implementation in a live-lab setting.
Customization Options for the Course
- The course is based on the open-source version of Confluent: Confluent Open Source.
- To arrange for customized training, please contact us to discuss your requirements.
Apache Flink Fundamentals
28 HoursThis instructor-led, live training in the UAE (online or onsite) introduces the principles and approaches behind distributed stream and batch data processing, and walks participants through the creation of a real-time, data streaming application in Apache Flink.
By the end of this training, participants will be able to:
- Set up an environment for developing data analysis applications.
- Understand how Apache Flink's graph-processing library (Gelly) works.
- Package, execute, and monitor Flink-based, fault-tolerant, data streaming applications.
- Manage diverse workloads.
- Perform advanced analytics.
- Set up a multi-node Flink cluster.
- Measure and optimize performance.
- Integrate Flink with different Big Data systems.
- Compare Flink capabilities with those of other big data processing frameworks.
Introduction to Graph Computing
28 HoursIn this instructor-led, live training in the UAE, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.
By the end of this training, participants will be able to:
- Understand how graph data is persisted and traversed.
- Select the best framework for a given task (from graph databases to batch processing frameworks.)
- Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
- View real-world big data problems in terms of graphs, processes and traversals.
Apache Kafka for Python Programmers
7 HoursThis instructor-led, live training in the UAE (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Apache Kafka features in data streaming with Python.
By the end of this training, participants will be able to use Apache Kafka to monitor and manage conditions in continuous data streams using Python programming.
Stream Processing with Kafka Streams
7 HoursKafka Streams is a client-side library designed for developing applications and microservices that exchange data with a Kafka messaging system. Traditionally, Apache Kafka has depended on Apache Spark or Apache Storm for processing data between message producers and consumers. However, by utilizing the Kafka Streams API within an application, data can be processed directly inside Kafka, eliminating the need to send it to another cluster for processing.
In this instructor-led live training session, participants will learn how to incorporate Kafka Streams into a series of sample Java applications that exchange data with Apache Kafka for stream processing.
By the end of this training, participants will be able to:
- Comprehend the features and benefits of Kafka Streams compared to other stream processing frameworks
- Process streaming data directly within a Kafka cluster
- Create Java or Scala applications or microservices that integrate with Kafka and Kafka Streams
- Write succinct code that converts input Kafka topics into output Kafka topics
- Construct, package, and deploy the application
Audience
- Developers
Course Format
- The course includes lectures, discussions, exercises, and extensive hands-on practice.
Notes
- To request a customized training for this course, please contact us to arrange the details.
Confluent KSQL
7 HoursThis instructor-led, live training in the UAE (online or onsite) is aimed at developers who wish to implement Apache Kafka stream processing without writing code.
By the end of this training, participants will be able to:
- Install and configure Confluent KSQL.
- Set up a stream processing pipeline using only SQL commands (no Java or Python coding).
- Carry out data filtering, transformations, aggregations, joins, windowing, and sessionization entirely in SQL.
- Design and deploy interactive, continuous queries for streaming ETL and real-time analytics.
Apache NiFi for Administrators
21 HoursIn this instructor-led, live training in the UAE (onsite or remote), participants will learn how to deploy and manage Apache NiFi in a live lab environment.
By the end of this training, participants will be able to:
- Install and configure Apachi NiFi.
- Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes.
- Automate dataflows.
- Enable streaming analytics.
- Apply various approaches for data ingestion.
- Transform Big Data and into business insights.
Apache NiFi for Developers
7 HoursIn this instructor-led, live training in the UAE, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi.
By the end of this training, participants will be able to:
- Understand NiFi's architecture and dataflow concepts.
- Develop extensions using NiFi and third-party APIs.
- Custom develop their own Apache Nifi processor.
- Ingest and process real-time data from disparate and uncommon file formats and data sources.
Python and Spark for Big Data (PySpark)
21 HoursIn this instructor-led, live training in the UAE, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.
By the end of this training, participants will be able to:
- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world cases.
- Use different tools and techniques for big data analysis using PySpark.
Spark Streaming with Python and Kafka
7 HoursThis instructor-led, live training in the UAE (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.
By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.
Apache Spark MLlib
35 HoursMLlib serves as the machine learning (ML) library for Spark, aiming to make scalable and user-friendly practical machine learning accessible. It encompasses various learning algorithms and utilities such as classification, regression, clustering, collaborative filtering, dimensionality reduction, along with foundational optimization tools and advanced pipeline APIs.
The library is divided into two main packages:
-
spark.mllib includes the original API constructed using RDDs.
-
spark.ml offers a more sophisticated API based on DataFrames, facilitating the creation of ML pipelines.
Audience
This course is tailored for engineers and developers looking to leverage an integrated Machine Learning Library within Apache Spark.
Stratio: Rocket and Intelligence Modules with PySpark
14 HoursStratio is a data-centric platform that integrates big data, AI, and governance into a single solution. Its Rocket and Intelligence modules enable rapid data exploration, transformation, and advanced analytics in enterprise environments.
This instructor-led, live training (online or onsite) is aimed at intermediate-level data professionals who wish to use the Rocket and Intelligence modules in Stratio effectively with PySpark, focusing on looping structures, user-defined functions, and advanced data logic.
By the end of this training, participants will be able to:
- Navigate and work within the Stratio platform using Rocket and Intelligence modules.
- Apply PySpark in the context of data ingestion, transformation, and analysis.
- Use loops and conditional logic to control data workflows and feature engineering tasks.
- Create and manage user-defined functions (UDFs) for reusable data operations in PySpark.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.