Home
Big Data Training

Big Data - Data Science Training Course

This classroom based training session will explore Big Data. Delegates will have computer based examples and case study exercises to undertake with relevant big data tools

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Big data fundamentals
- Big Data and its role in the corporate world
- The phases of development of a Big Data strategy within a corporation
- Explain the rationale underlying a holistic approach to Big Data
- Components needed in a Big Data Platform
- Big data storage solution
- Limits of Traditional Technologies
- Overview of database types
- The four dimensions of Big Data
Big data impact on business
- Business importance of Big Data
- Challenges of extracting useful data
- Integrating Big data with traditional data
Big data storage technologies
- Overview of big data technologies
  - Data storage models
  - Hadoop
  - Hive
  - Cassandra
  - MongoDB
- Choosing the right big data technology
Processing big data
- Connecting and extracting data from database
- Transforming and preparation data for processing
- Using Hadoop MapReduce for processing distributed data
- Monitoring and executing Hadoop MapReduce jobs
- Hadoop distributed file system building blocks
- Mapreduce and Yarn
- Handling streaming data with Spark
Big data analysis tools and technologies
- Programming Hadoop with Pig Latin language
- Querying big data with Hive
- Mining data with Mahout
- Visualizing and reporting tools
Big data in business
- Managing and establishing Big Data needs
- Business importance of Big Data
- Selecting the right big data tools for the problem

Data Warehousing Concepts

What is Data Ware House?
Difference between OLTP and Data Ware Housing
Data Acquisition
Data Extraction
Data Transformation.
Data Loading
Data Marts
Dependent vs Independent data Mart
Data Base design

ETL Testing Concepts:

Introduction.
Software development life cycle.
Testing methodologies.
ETL Testing Work Flow Process.
ETL Testing Responsibilities in Data stage.

Big data Fundamentals

Big Data and its role in the corporate world
The phases of development of a Big Data strategy within a corporation
Explain the rationale underlying a holistic approach to Big Data
Components needed in a Big Data Platform
Big data storage solution
Limits of Traditional Technologies
Overview of database types

NoSQL Databases

Hadoop

Map Reduce

Apache Spark

Requirements

Delegates should have an awareness and some experience of storgage tools and an awreness of handling large data sets

14 Hours

Need help picking the right course?

Big Data - Data Science Training Course - Booking

Full name *

Email *

Phone *

Job Title

Company Name

Address 1 *

Address 2

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Allow Publishing Certificate

If you check this box the participants will receive an option to publish their course certificate on the NobleProg Certified Professional Catalogue.

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Big Data - Data Science Training Course - Enquiry

Full name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Big Data - Data Science - Consultancy Enquiry

Full name *

Phone *

Email *

Company Name

Consultancy Subject *

Consultancy Goal

Consultancy Duration

Number of Consultants

Suitable Date

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (1)

trainer's knowledge

Fatma Badi - Dubai Electricity & Water Authority

Course - Big Data - Data Science

Upcoming Courses

Related Courses

Data Vault: Building a Scalable Data Warehouse

28 Hours

In this instructor-led, live training in the UAE, participants will learn how to build a Data Vault.

By the end of this training, participants will be able to:

Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
Build and deploy highly scalable and repeatable warehouses.

Spark Streaming with Python and Kafka

7 Hours

This instructor-led, live training in the UAE (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.

By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.

Confluent KSQL

7 Hours

This instructor-led, live training in the UAE (online or onsite) is aimed at developers who wish to implement Apache Kafka stream processing without writing code.

By the end of this training, participants will be able to:

Install and configure Confluent KSQL.
Set up a stream processing pipeline using only SQL commands (no Java or Python coding).
Carry out data filtering, transformations, aggregations, joins, windowing, and sessionization entirely in SQL.
Design and deploy interactive, continuous queries for streaming ETL and real-time analytics.

Apache Ignite for Developers

14 Hours

This instructor-led, live training in the UAE (online or onsite) is aimed at developers who wish to learn the principles behind persistent and pure in-memory storage as they step through the creation of a sample in-memory computing project.

By the end of this training, participants will be able to:

Use Ignite for in-memory, on-disk persistence as well as a purely distributed in-memory database.
Achieve persistence without syncing data back to a relational database.
Use Ignite to carry out SQL and distributed joins.
Improve performance by moving data closer to the CPU, using RAM as a storage.
Spread data sets across a cluster to achieve horizontal scalability.
Integrate Ignite with RDBMS, NoSQL, Hadoop and machine learning processors.

Unified Batch and Stream Processing with Apache Beam

14 Hours

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.

In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.

By the end of this training, participants will be able to:

Install and configure Apache Beam.
Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
Execute pipelines across multiple environments.

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

This course will be available Scala in the future. Please contact us to arrange.

Apache Apex: Processing Big Data-in-Motion

21 Hours

Apache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable.

This instructor-led, live training introduces Apache Apex's unified stream processing architecture, and walks participants through the creation of a distributed application using Apex on Hadoop.

By the end of this training, participants will be able to:

Understand data processing pipeline concepts such as connectors for sources and sinks, common data transformations, etc.
Build, scale and optimize an Apex application
Process real-time data streams reliably and with minimum latency
Use Apex Core and the Apex Malhar library to enable rapid application development
Use the Apex API to write and re-use existing Java code
Integrate Apex into other applications as a processing engine
Tune, test and scale Apex applications

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Apache Storm

28 Hours

Apache Storm is a distributed, real-time computation engine used for enabling real-time business intelligence. It does so by enabling applications to reliably process unbounded streams of data (a.k.a. stream processing).

"Storm is for real-time processing what Hadoop is for batch processing!"

In this instructor-led live training, participants will learn how to install and configure Apache Storm, then develop and deploy an Apache Storm application for processing big data in real-time.

Some of the topics included in this training include:

Apache Storm in the context of Hadoop
Working with unbounded data
Continuous computation
Real-time analytics
Distributed RPC and ETL processing

Request this course now!

Audience

Software and ETL developers
Mainframe professionals
Data scientists
Big data analysts
Hadoop professionals

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Apache NiFi for Administrators

21 Hours

In this instructor-led, live training in the UAE (onsite or remote), participants will learn how to deploy and manage Apache NiFi in a live lab environment.

By the end of this training, participants will be able to:

Install and configure Apachi NiFi.
Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes.
Automate dataflows.
Enable streaming analytics.
Apply various approaches for data ingestion.
Transform Big Data and into business insights.

Apache NiFi for Developers

7 Hours

In this instructor-led, live training in the UAE, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi.

By the end of this training, participants will be able to:

Understand NiFi's architecture and dataflow concepts.
Develop extensions using NiFi and third-party APIs.
Custom develop their own Apache Nifi processor.
Ingest and process real-time data from disparate and uncommon file formats and data sources.

Apache Flink Fundamentals

28 Hours

This instructor-led, live training in the UAE (online or onsite) introduces the principles and approaches behind distributed stream and batch data processing, and walks participants through the creation of a real-time, data streaming application in Apache Flink.

By the end of this training, participants will be able to:

Set up an environment for developing data analysis applications.
Understand how Apache Flink's graph-processing library (Gelly) works.
Package, execute, and monitor Flink-based, fault-tolerant, data streaming applications.
Manage diverse workloads.
Perform advanced analytics.
Set up a multi-node Flink cluster.
Measure and optimize performance.
Integrate Flink with different Big Data systems.
Compare Flink capabilities with those of other big data processing frameworks.

Python and Spark for Big Data (PySpark)

21 Hours

In this instructor-led, live training in the UAE, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

Learn how to use Spark with Python to analyze Big Data.
Work on exercises that mimic real world cases.
Use different tools and techniques for big data analysis using PySpark.

Introduction to Graph Computing

28 Hours

In this instructor-led, live training in the UAE, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

Understand how graph data is persisted and traversed.
Select the best framework for a given task (from graph databases to batch processing frameworks.)
Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
View real-world big data problems in terms of graphs, processes and traversals.

Apache Spark MLlib

35 Hours

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

spark.mllib contains the original API built on top of RDDs.
spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

21 Hours

This course is intended for developers and data scientists who want to understand and implement artificial intelligence in their applications. Special focus is placed on data analytics, distributed AI, and natural language processing.

Knowledge Discovery in Databases (KDD)

21 Hours

Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing.

In this instructor-led, live course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes.

Audience

Data analysts or anyone interested in learning how to interpret data to solve problems

Format of the Course

After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.

Big Data - Data Science Training Course

Course Outline

Requirements

Testimonials (1)

Fatma Badi - Dubai Electricity & Water Authority

Course - Big Data - Data Science

Upcoming Courses

Big Data - Data Science

Big Data - Data Science

Big Data - Data Science

Big Data - Data Science

Big Data - Data Science

Big Data - Data Science

Big Data - Data Science

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites