Course Outline
Introduction
- Stream processing vs batch processing
- Analytics-focused stream processing
Overview Frameworks and Programming Languages
- Spark Streaming (Scala)
- Kafka Streaming (Java)
- Flink
- Storm
- Comparison of Features and Strengths of Each Framework
Overview of Data Sources
- Live data as a series of events over time
- Historical data sources
Deployment Options
- In the cloud (AWS, etc.)
- On premise (private cloud, etc.)
Getting Started
- Setting up the Development Environment
- Installing and Configuring
- Assessing Your Data Analysis Needs
Operating a Streaming Framework
- Integrating the Streaming Framework with Big Data Tools
- Event Stream Processing (ESP) vs Complex Event Processing (CEP)
- Transforming the Input Data
- Inspecting the Output Data
- Integrating the Stream Processing Framework with Existing Applications and Microservices
Troubleshooting
Summary and Conclusion
Requirements
- Programming experience in any language
- An understanding of Big Data concepts (Hadoop, etc.)
Testimonials
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Related Courses
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 hoursThis course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib
35 hoursMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Apache Ignite for Developers
14 hoursApache Ignite is an in-memory computing platform that sits between the application and data layer to improve speed, scale, and availability. In this instructor-led, live training, participants will learn the principles behind persistent and pure
Apache Apex: Processing Big Data-in-Motion
21 hoursApache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable. This instructor-led, live
Unified Batch and Stream Processing with Apache Beam
14 hoursApache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of
Apache Flink Fundamentals
28 hoursApache Flink is an open-source framework for scalable stream and batch data processing. This instructor-led, live training introduces the principles and approaches behind distributed stream and batch data processing, and walks participants
Introduction to Graph Computing
28 hoursMany real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Confluent KSQL
7 hoursConfluent KSQL is a stream processing framework built on top of Apache Kafka. It enables real-time data processing using SQL operations. This instructor-led, live training (online or onsite) is aimed at developers who wish to implement Apache
Apache NiFi for Administrators
21 hoursApache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a
Apache NiFi for Developers
7 hoursApache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a
Samza for Stream Processing
14 hoursApache Samza is an open-source near-realtime, asynchronous computational framework for stream processing. It uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource
Tigon: Real-time Streaming for the Real World
14 hoursTigon is an open-source, real-time, low-latency, high-throughput, native YARN, stream processing framework that sits on top of HDFS and HBase for persistence. Tigon applications address use cases such as network intrusion detection and analytics,
Python and Spark for Big Data (PySpark)
21 hoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this
Spark Streaming with Python and Kafka
7 hoursApache Spark Streaming is a scalable, open source stream processing system that allows users to process real-time data from supported sources. Spark Streaming enables fault-tolerant processing of data streams. This instructor-led, live
Apache Storm
28 hoursApache Storm is a distributed, real-time computation engine used for enabling real-time business intelligence. It does so by enabling applications to reliably process unbounded streams of data (a.k.a. stream processing). "Storm is for