Course Outline


  • Stream processing vs batch processing
  • Analytics-focused stream processing

Overview Frameworks and Programming Languages

  • Spark Streaming (Scala)
  • Kafka Streaming (Java)
  • Flink
  • Storm
  • Comparison of Features and Strengths of Each Framework

Overview of Data Sources

  • Live data as a series of events over time
  • Historical data sources

Deployment Options

  • In the cloud (AWS, etc.)
  • On premise (private cloud, etc.)

Getting Started

  • Setting up the Development Environment
  • Installing and Configuring
  • Assessing Your Data Analysis Needs

Operating a Streaming Framework

  • Integrating the Streaming Framework with Big Data Tools
  • Event Stream Processing (ESP) vs Complex Event Processing (CEP)
  • Transforming the Input Data
  • Inspecting the Output Data
  • Integrating the Stream Processing Framework with Existing Applications and Microservices


Summary and Conclusion


  • Programming experience in any language
  • An understanding of Big Data concepts (Hadoop, etc.)
  21 Hours


Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

 21 hours

This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and

Apache Spark MLlib

 35 hours

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative

Apache Ignite for Developers

 14 hours

Apache Ignite is an in-memory computing platform that sits between the application and data layer to improve speed, scale, and availability. In this instructor-led, live training, participants will learn the principles behind persistent and pure

Apache Apex: Processing Big Data-in-Motion

 21 hours

Apache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable. This instructor-led, live

Unified Batch and Stream Processing with Apache Beam

 14 hours

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of

Apache Flink Fundamentals

 28 hours

Apache Flink is an open-source framework for scalable stream and batch data processing. This instructor-led, live training introduces the principles and approaches behind distributed stream and batch data processing, and walks participants

Introduction to Graph Computing

 28 hours

Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set

Confluent KSQL

 7 hours

Confluent KSQL is a stream processing framework built on top of Apache Kafka. It enables real-time data processing using SQL operations. This instructor-led, live training (online or onsite) is aimed at developers who wish to implement Apache

Apache NiFi for Administrators

 21 hours

Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a

Apache NiFi for Developers

 7 hours

Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a

Samza for Stream Processing

 14 hours

Apache Samza is an open-source near-realtime, asynchronous computational framework for stream processing.  It uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource

Tigon: Real-time Streaming for the Real World

 14 hours

Tigon is an open-source, real-time, low-latency, high-throughput, native YARN, stream processing framework that sits on top of HDFS and HBase for persistence. Tigon applications address use cases such as network intrusion detection and analytics,

Python and Spark for Big Data (PySpark)

 21 hours

Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this

Spark Streaming with Python and Kafka

 7 hours

Apache Spark Streaming is a scalable, open source stream processing system that allows users to process real-time data from supported sources. Spark Streaming enables fault-tolerant processing of data streams. This instructor-led, live

Apache Storm

 28 hours

Apache Storm is a distributed, real-time computation engine used for enabling real-time business intelligence. It does so by enabling applications to reliably process unbounded streams of data (a.k.a. stream processing). "Storm is for