Data Engineering Integration for Developers Training Course

This course is designed for software 10.5 users. Learn how to accelerate Data Engineering Integration through large data ingestion, incremental loading, transformations, complex file processing, dynamic mappings, and Python scripting. Discover methods to reuse application logic in Data Engineering scenarios while focusing on monitoring, troubleshooting, and best practices.

Objectives

Upon successful completion of this course, participants will be able to:

Ingest large volumes of data into Hive and HDFS
Conduct incremental loads within mass ingestion processes
Handle both initial and subsequent incremental data loads
Integrate with relational databases using SQOOP
Execute transformations across different engines
Run mappings via JDBC in Spark mode
Implement stateful computing and windowing techniques
Process complex file formats
Analyze hierarchical data on the Spark engine
Configure profiles and select sampling options on the Spark engine
Utilize Dynamic Mappings effectively
Generate Audits for Mappings
Monitor logs through REST Operations Hub
Use Log Aggregation to monitor logs and troubleshoot issues
Operate mappings within the Databricks environment
Create mappings to access Delta Lake tables
Optimize performance for Spark and Databricks jobs

This course is available as onsite live training in United Arab Emirates or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Module 1: Informatica Data Engineering Management Overview

Data Engineering concepts
Data Engineering Management features
Benefits of Data Engineering Management
Data Engineering Management architecture
Data Engineering Management developer tasks
Data Engineering Integration 10.4 new features

Module 2: Ingestion and Extraction in Hadoop

Integrating DEI with Hadoop cluster
Hadoop file systems
Data Ingestion to HDFS and Hive using SQOOP
Mass Ingestion to HDFS and Hive – Initial load
Mass Ingestion to HDFS and Hive - Incremental load
Lab: Configure SQOOP for Processing Data Between Oracle (SQOOP) to HDFS
Lab: Configure SQOOP for processing data between an Oracle database and Hive
Lab: Creating Mapping Specifications using Mass Ingestion Service

Module 3: Native and Hadoop Engine Strategy

Data Engineering Integration engine strategy
Hive Engine architecture
MapReduce
Tez
Spark architecture
Blaze architecture
Lab: Executing a mapping in Spark mode
Lab: Connecting to a Deployed Application

Module 4: Data Engineering Development Process

Advanced Transformations in Data Engineering Integration Python and Update Strategy
Hive ACID Use Case
Stateful Computing and Windowing
Lab: Creating a Reusable Python Transformation
Lab: Creating an Active Python Transformation
Lab: Performing Hive Upserts
Lab: Using Windowing Function LEAD
Lab: Using Windowing Function LAG
Lab: Creating a Macro Transformation

Module 5: Complex File Processing

Data Engineering file formats – Avro, Parquet, JSON
Complex file data types – Structs, Arrays, Maps
Complex Configuration, Operators and Functions
Lab: Converting Flat File data object to an Avro file
Lab: Using complex data types - Arrays, Structs, and Maps in a mapping

Module 6: Hierarchical Data Processing

Hierarchical Data Processing
Flatten Hierarchical Data
Dynamic Flattening with Schema Changes
Hierarchical Data Processing with Schema Changes
Complex Configuration, Operators and Functions
Dynamic Ports
Dynamic Input Rules
Lab: Flattening a complex port in a Mapping
Lab: Building dynamic mappings using dynamic ports
Lab: Building dynamic mappings using input rules
Lab: Performing Dynamic Flattening of complex ports
Lab: Parsing Hierarchical Data on the Spark Engine

Module 7: Mapping Optimization and Performance Tuning

Validation Environments
Execution Environment
Mapping Optimization
Mapping Recommendations and Insight
Scheduling, Queuing, and Node Labeling
Mapping Audits
Lab: Implementing Recommendation
Lab: Implementing Insight
Lab: Implementing Mapping Audits

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Hadoop Environment Logs
Spark Engine Monitoring
Blaze Engine Monitoring
REST Operations Hub
Log Aggregator
Troubleshooting
Lab: Monitoring Mappings using REST Operations Hub
Lab: Viewing and analyzing logs using Log Aggregator

Module 9: Intelligent Structure Model

Intelligent Structure Discovery Overview
Intelligent Structure Model
Lab: Use an Intelligent Structure Model in a Mapping

Module 10: Databricks Overview

Databricks overview
Steps to configure Databricks
Databricks clusters
Notebooks, Jobs, and Data
Delta Lakes

Module 11: Databricks Integration

Databricks Integration
Components of the Informatica and the Databricks environments
Run-time process on the Databricks Spark Engine
Databricks Integration Task Flow
Pre-requisites for Databricks integration
Cluster Workflows
Demo: Set up Databricks connection
Demo: Run a mapping with Databricks Spark engine

Requirements

Developer Tool for Big Data Developers

21 Hours

Need help picking the right course?

Testimonials (2)

Very useful in because it helps me understand what we can do with the data in our context. It will also help me

Data Engineering Integration for Developers Training Course

Objectives

Course Outline

Module 1: Informatica Data Engineering Management Overview

Module 2: Ingestion and Extraction in Hadoop

Module 3: Native and Hadoop Engine Strategy

Module 4: Data Engineering Development Process

Module 5: Complex File Processing

Module 6: Hierarchical Data Processing

Module 7: Mapping Optimization and Performance Tuning

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Module 9: Intelligent Structure Model

Module 10: Databricks Overview

Module 11: Databricks Integration

Requirements

Testimonials (2)

Nicolas NEMORIN - Adecco Groupe France

Course - KNIME Analytics Platform for BI

Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)

Course - Talend Open Studio for ESB

Upcoming Courses

Data Engineering Integration for Developers

Data Engineering Integration for Developers

Data Engineering Integration for Developers

Data Engineering Integration for Developers

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Data Engineering Integration for Developers Training Course

Objectives

Course Outline

Module 1: Informatica Data Engineering Management Overview

Module 2: Ingestion and Extraction in Hadoop

Module 3: Native and Hadoop Engine Strategy

Module 4: Data Engineering Development Process

Module 5: Complex File Processing

Module 6: Hierarchical Data Processing

Module 7: Mapping Optimization and Performance Tuning

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Module 9: Intelligent Structure Model

Module 10: Databricks Overview

Module 11: Databricks Integration

Requirements

Testimonials (2)

Nicolas NEMORIN - Adecco Groupe France

Course - KNIME Analytics Platform for BI

Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)

Course - Talend Open Studio for ESB

Upcoming Courses

Data Engineering Integration for Developers

Data Engineering Integration for Developers

Data Engineering Integration for Developers

Data Engineering Integration for Developers

Related Courses

KNIME Analytics Platform for BI

Oracle GoldenGate

Pentaho Open Source BI Suite Community Edition (CE)

Pentaho Data Integration Fundamentals

Sensor Fusion Algorithms

Talend Administration Center (TAC)

Talend Big Data Integration

Talend Cloud

Talend Data Stewardship

Talend Open Studio for ESB

Related Categories

Data Integration

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites