Course Outline
Introduction
- Introducing Big Data : Evolutions over the years
- The Characteristics of Big Data
- Identifying Different Sources of Big Data
- How Big Data Is Used in Business ?
The challenges of Big Data
- Identifying the Challenges of Big Data : Current and Emerging Challenges
- Why Businesses are Struggling with Big Data ?
- State of Big Data Projects
- Understanding the layers of big data architecture
- The Big Data Management - Introduction
- Defining capabilities of big data management
- Overcoming obstacles with big data management
Building Blocks of an Efficient Big Data Management
- The Big Data Laboratory versus Big Data Factory
- Understanding the Three Pillars of Data Management
- Data Integration
- Data Governance
- Data Security
- Understanding functions of Big Data Management Processes
- Competencies of the Big Data Team
Implementing Big Data Management
- Implementing the Big Data Management
- Identifying Big Data Tools
- Leveraging the Right Tools
- What are Commercial Tools built atop Open Source Projects ?
- How to combine Integration, Governance and Security ?
Conclusion - Tips for Succeeding with Big Data Management
- Use Cases to provide Business Value
- Identifying Data Quality Issues Early
- Aligning Your Vocabulary
- Centralizing and Automating your Data Management
- Leveraging Data Lakes
- Collaborative Methods for Data Governance
- Using a 360-Degree View on your Data and Relationships
- How to work with Vendors to Accelerate Your Deployments ?
Requirements
There are no specific requirements needed to attend this course.
Testimonials
the scope of material
Maciej Jonczyk
systematizing knowledge in the field of ML
Orange Polska
I really was benefit from the willingness of the trainer to share more.
Balaram Chandra Paul
I generally was benefit from the presentation of technologies.
Continental AG / Abteilung: CF IT Finance
Overall the Content was good.
Sameer Rohadia
Michael the trainer is very knowledgeable and skillful about the subject of Big Data and R. He is very flexible and quickly customize the training meeting clients' need. He is also very capable to solve technical and subject matter problems on the go. Fantastic and professional training!.
Xiaoyuan Geng - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
I really enjoyed the introduction of new packages.
Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
The tutor, Mr. Michael An, interacted with the audience very well, the instruction was clear. The tutor also go extent to add more information based on the requests from the students during the training.
Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
The subject matter and the pace were perfect.
Tim - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
The example and training material were sufficient and made it easy to understand what you are doing.
Teboho Makenete
Richard's training style kept it interesting, the real world examples used helped to drive the concepts home.
Jamie Martin-Royle - NBrown Group
The content, as I found it very interesting and think it would help me in my final year at University.
Krishan Mistry - NBrown Group
I generally liked the fernando's knowledge.
Valentin de Dianous - Informatique ProContact INC.
The broad coverage of the subjects
- Roche
Intensity, Training materials and expertise, Clarity, Excellent communication with Alessandra
Marija Hornis Dmitrovic - Marija Hornis
R programming
Osden Jokonya - University of the Western Cape
Practical exercises
JOEL CHIGADA - University of the Western Cape
Related Courses
Apache Accumulo Fundamentals
21 hoursApache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval. It is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift. This
Apache Airflow
21 hoursApache Airflow is a platform for authoring, scheduling and monitoring workflows. This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use Apache Airflow to build and manage end-to-end data pipelines. By
Apache Drill
21 hoursApache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query.
Apache Drill Performance Optimization and Debugging
7 hoursApache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single
Apache Drill Query Optimization
7 hoursApache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query.
Apache Hama
14 hoursApache Hama is a framework based on the Bulk Synchronous Parallel (BSP) computing model and is primarily used for Big Data analytics. In this instructor-led, live training, participants will learn the fundamentals of Apache Hama as they step
Apache Arrow for Data Analysis across Disparate Data Sources
14 hoursApache Arrow is an open-source in-memory data processing framework. It is often used together with other data science tools for accessing disparate data stores for analysis. It integrates well with other technologies such as GPU databases, machine
Big Data & Database Systems Fundamentals
14 hoursThe course is part of the Data Scientist skill set (Domain: Data and Technology).
Data Vault: Building a Scalable Data Warehouse
28 hoursData Vault Modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all the time". Its
Data Virtualization with Denodo Platform
14 hoursDenodo is a data virtualization platform for managing big data, logical data warehouses, and enterprise data operations. This instructor-led, live training (online or onsite) is aimed at architects, developers, and administrators who wish to use
Dremio for Self-Service Data Analysis
21 hoursDremio is an open-source "self-service data platform" that accelerates the querying of different types of data sources. Dremio integrates with relational databases, Apache Hadoop, MongoDB, Amazon S3, ElasticSearch, and other data sources.
Apache Druid for Real-Time Data Analysis
21 hoursApache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business
Apache Kylin: From Classic OLAP to Real-Time Data Warehouse
14 hoursApache Kylin is an extreme, distributed analytics engine for big data. In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse. By the end of this training, participants will
Zeppelin for Interactive Data Analytics
14 hoursApache Zeppelin is a web-based notebook for capturing, exploring, visualizing and sharing Hadoop and Spark based data. This instructor-led, live training introduces the concepts behind interactive data analytics and walks participants through the