Course Outline

Module 1: Informatica Data Engineering Management Overview

  • Data Engineering concepts
  • Data Engineering Management features
  • Benefits of Data Engineering Management
  • Data Engineering Management architecture
  • Data Engineering Management developer tasks
  • Data Engineering Integration 10.4 new features

Module 2: Ingestion and Extraction in Hadoop

  • Integrating DEI with Hadoop cluster
  • Hadoop file systems
  • Data Ingestion to HDFS and Hive using SQOOP
  • Mass Ingestion to HDFS and Hive – Initial load
  • Mass Ingestion to HDFS and Hive - Incremental load
  • Lab: Configure SQOOP for Processing Data Between Oracle  (SQOOP) to HDFS
  • Lab: Configure SQOOP for processing data between an Oracle database and Hive
  • Lab: Creating Mapping Specifications using Mass Ingestion Service 

Module 3: Native and Hadoop Engine Strategy

  • Data Engineering Integration engine strategy
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Lab: Executing a mapping in Spark mode
  • Lab: Connecting to a Deployed Application

Module 4: Data Engineering Development Process

  • Advanced Transformations in Data Engineering Integration Python and Update Strategy
  • Hive ACID Use Case
  • Stateful Computing and Windowing
  • Lab: Creating a Reusable Python Transformation
  • Lab: Creating an Active Python Transformation
  • Lab: Performing Hive Upserts
  • Lab: Using Windowing Function LEAD
  • Lab: Using Windowing Function LAG
  • Lab: Creating a Macro Transformation

Module 5: Complex File Processing

  • Data Engineering file formats – Avro, Parquet, JSON
  • Complex file data types – Structs, Arrays, Maps
  • Complex Configuration, Operators and Functions
  • Lab: Converting Flat File data object to an Avro file
  • Lab: Using complex data types - Arrays, Structs, and Maps in a mapping

Module 6: Hierarchical Data Processing

  • Hierarchical Data Processing
  • Flatten Hierarchical Data
  • Dynamic Flattening with Schema Changes
  • Hierarchical Data Processing with Schema Changes
  • Complex Configuration, Operators and Functions
  • Dynamic Ports
  • Dynamic Input Rules
  • Lab: Flattening a complex port in a Mapping
  • Lab: Building dynamic mappings using dynamic ports
  • Lab: Building dynamic mappings using input rules
  • Lab: Performing Dynamic Flattening of complex ports
  • Lab: Parsing Hierarchical Data on the Spark Engine

Module 7: Mapping Optimization and Performance Tuning

  • Validation Environments
  • Execution Environment
  • Mapping Optimization
  • Mapping Recommendations and Insight
  • Scheduling, Queuing, and Node Labeling
  • Mapping Audits
  • Lab: Implementing Recommendation
  • Lab: Implementing Insight
  • Lab: Implementing Mapping Audits

Module 8: Monitoring Logs and Troubleshooting in Hadoop

  • Hadoop Environment Logs
  • Spark Engine Monitoring
  • Blaze Engine Monitoring
  • REST Operations Hub
  • Log Aggregator
  • Troubleshooting
  • Lab: Monitoring Mappings using REST Operations Hub
  • Lab: Viewing and analyzing logs using Log Aggregator

Module 9: Intelligent Structure Model

  • Intelligent Structure Discovery Overview
  • Intelligent Structure Model
  • Lab: Use an Intelligent Structure Model in a Mapping

Module 10: Databricks Overview

  • Databricks overview
  • Steps to configure Databricks
  • Databricks clusters
  • Notebooks, Jobs, and Data
  • Delta Lakes

Module 11: Databricks Integration

  • Databricks Integration
  • Components of the Informatica and the Databricks environments
  • Run-time process on the Databricks Spark Engine
  • Databricks Integration Task Flow
  • Pre-requisites for Databricks integration
  • Cluster Workflows
  • Demo: Set up Databricks connection
  • Demo: Run a mapping with Databricks Spark engine

Requirements

Developer Tool for Big Data Developers

  21 Hours
 

Testimonials (4)

Related Courses

Related Categories