Course Outline
Day 1 - Fundamental Big Data
- Understanding Big Data
- Fundamental Terminology & Concepts
- Big Data Business & Technology Drivers
- Traditional Enterprise Technologies Related to Big Data
- Characteristics of Data in Big Data Environments
- Dataset Types in Big Data Environments
- Fundamental Analysis and Analytics
- Machine Learning Types
- Business Intelligence & Big Data
- Data Visualization & Big Data
- Big Data Adoption & Planning Considerations
Day 2 - Big Data Analysis & Technology Concepts
- Big Data Analysis Lifecycle (from business case evaluation to data analysis and visualization)
- A/B Testing, Correlation
- Regression, Heat Maps
- Time Series Analysis
- Network Analysis
- Spatial Data Analysis
- Classification, Clustering
- Outlier Detection
- Filtering (including collaborative filtering & content-based filtering)
- Natural Language Processing
- Sentiment Analysis, Text Analytics
- File Systems & Distributed File Systems, NoSQL
- Distributed & Parallel Data Processing,
- Processing Workloads, Clusters
- Cloud Computing & Big Data
- Foundational Big Data Technology Mechanisms
Day 3 - Fundamental Big Data Architecture
- New Big Data Mechanisms, including ...
- Security Engine
- Cluster Manager
- Data Governance Manager
- Visualization Engine
- Productivity Portal
- Data Processing Architectural Models, including ...
- Shared-Everything and Shared-Nothing Architectures
- Enterprise Data Warehouse and Big Data Integration Approaches, including ...
- Series
- Parallel
- Big Data Appliance
- Data Virtualization
- Architectural Big Data Environments, including ...
- ETL
- Analytics Engine
- Application Enrichment
- Cloud Computing & Big Data Architectural Considerations, including ...
- how Cloud Delivery and Deployment Models can be used to host and process Big Data Solutions
Day 4 - Advanced Big Data Architecture
- Big Data Solution Architectural Layers including ...
- Data Sources,
- Data Ingress and Storage,
- Event Stream Processing and Complex Event Processing,
- Egress,
- Visualization and Utilization,
- Big Data Architecture and Security,
- Maintenance and Governance
- Big Data Solution Design Patterns, including ...
- Patterns pertaining to Data Ingress,
- Data Wrangling,
- Data Storage,
- Data Processing,
- Data Analysis,
- Data Egress,
- Data Visualization
- Big Data Architectural Compound Patterns
Day 5 - Big Data Architecture Lab
-
Incorporates a set of detailed exercises that require delegates to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data architecture technologies, mechanisms and techniques can be applied to solve problems in Big Data environments.
Testimonials
I generally liked the fernando's knowledge.
Valentin de Dianous - Informatique ProContact INC.
Related Courses
Apache Accumulo Fundamentals
21 hoursApache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval. It is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift. This
Apache Airflow
21 hoursApache Airflow is a platform for authoring, scheduling and monitoring workflows. This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use Apache Airflow to build and manage end-to-end data pipelines. By
Apache Drill
21 hoursApache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query.
Apache Drill Performance Optimization and Debugging
7 hoursApache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single
Apache Drill Query Optimization
7 hoursApache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query.
Apache Hama
14 hoursApache Hama is a framework based on the Bulk Synchronous Parallel (BSP) computing model and is primarily used for Big Data analytics. In this instructor-led, live training, participants will learn the fundamentals of Apache Hama as they step
Apache Arrow for Data Analysis across Disparate Data Sources
14 hoursApache Arrow is an open-source in-memory data processing framework. It is often used together with other data science tools for accessing disparate data stores for analysis. It integrates well with other technologies such as GPU databases, machine
Big Data & Database Systems Fundamentals
14 hoursThe course is part of the Data Scientist skill set (Domain: Data and Technology).
Data Vault: Building a Scalable Data Warehouse
28 hoursData Vault Modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all the time". Its
Data Virtualization with Denodo Platform
14 hoursDenodo is a data virtualization platform for managing big data, logical data warehouses, and enterprise data operations. This instructor-led, live training (online or onsite) is aimed at architects, developers, and administrators who wish to use
Dremio for Self-Service Data Analysis
21 hoursDremio is an open-source "self-service data platform" that accelerates the querying of different types of data sources. Dremio integrates with relational databases, Apache Hadoop, MongoDB, Amazon S3, ElasticSearch, and other data sources.
Apache Druid for Real-Time Data Analysis
21 hoursApache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business
Apache Kylin: From Classic OLAP to Real-Time Data Warehouse
14 hoursApache Kylin is an extreme, distributed analytics engine for big data. In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse. By the end of this training, participants will
Zeppelin for Interactive Data Analytics
14 hoursApache Zeppelin is a web-based notebook for capturing, exploring, visualizing and sharing Hadoop and Spark based data. This instructor-led, live training introduces the concepts behind interactive data analytics and walks participants through the