Course Outline
Introduction
- Data at rest vs data in motion
Overview of Big Data Tools and Technologies
- Hadoop (HDFS and MapReduce) and Spark
Installing and Configuring NiFi
Overview of NiFi Architecture
Development Approaches
- Application development tools and mindset
- Extract, Transform, and Load (ETL) tools and mindset
Design Considerations
Components, Events, and Processor Patterns
Exercise: Streaming Data Feeds into HDFS
Error Handling
Controller Services
Exercise: Ingesting Data from IoT Devices using Web-Based APIs
Exercise: Developing a Custom Apache Nifi Processor using JSON
Testing and Troubleshooting
Contributing to Apache NiFi
Summary and Conclusion
Requirements
- Java programming experience.
- Experience with Maven.
Audience
- Developers
- Data engineers
Testimonials
I genuinely liked work exercises with cluster to see performance of nodes across cluster and extended functionality.
CACI Ltd
That I had it in the first place.
Peter Scales - CACI Ltd
Related Courses
Apache Ambari: Efficiently Manage Hadoop Clusters
21 hoursApache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to
Administrator Training for Apache Hadoop
35 hoursAudience: The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment Goal: Deep knowledge on Hadoop cluster
Hadoop Administration
21 hoursThe course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster
Hadoop For Administrators
21 hoursApache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan
Hadoop for Business Analysts
21 hoursApache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of
Hadoop for Developers (4 days)
28 hoursApache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop
Advanced Hadoop for Developers
21 hoursApache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase. These advanced programming techniques will be beneficial to
Hadoop for Developers and Administrators
21 hoursHadoop is the most popular Big Data processing framework.
Hadoop Administration on MapR
28 hoursAudience: This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand.
HBase for Developers
21 hoursThis course introduces HBase – a NoSQL store on top of Hadoop. The course is intended for developers who will be using HBase to develop applications, and administrators who will manage HBase clusters. We will walk a developer
Hortonworks Data Platform (HDP) for Administrators
21 hoursHortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces
Data Analysis with Hive/HiveQL
7 hoursThis course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Impala for Business Intelligence
21 hoursCloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache
Apache Avro: Data Serialization for Distributed Applications
14 hoursAudience Developers Format of the Course Lectures, hands-on practice, small tests along the way to gauge understanding
Apache NiFi for Administrators
21 hoursApache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a