Course Outline

Week 1 Big Data concepts

  • VVVV (Velocity, Volume, Variety, Veracity) definition
  • Limits to traditional data processing capacity
  • Distributed Processing
  • Statistical Analysis
  • Machine Learning Analysis Types
  • Data Visualization
  • Distributed Processing (e.g. map-reduce)
  • Introduction to used languages
  • R language crash-course
  • Python crash course

Weeks 2&3 Performing Data Analysis

  • Statistical Analysis
  • Descriptive Statistics in Big Data sets (e.g. calculating mean)
  • Inferential Statistics (estimating)
  • Forecasting with Correlation and Regression models
  • Time Series analysis
  • Basics of Machine Learning
  • Supervised vs unsupervised learning
  • Classification and clustering
  • Estimating cost of specific methods
  • Filter

Week 4 Natural Language Processing

  • Processing text
  • Understanding meaning of the text
  • Automatic text generation
  • Sentiment/Topic Analysis
  • Computer Vision

Week 5&6 Tooling concept

  • Data storage solution (SQL, NoSQL, hierarchical, object oriented, document oriented)
  • MySQL, Cassandra, MongoDB, Elasticsearch, HDFS, etc...)
  • Choosing right solution to the problem
  • Distributed Processing
  • Spark
  • Machine Learning with Spark (MLLib)
  • Spark SQL
  • Scalability
  • Public cloud (AWS, Google, etc...)
  • Private cloud (OpenStack, cloud foundry)
  • Autoscalability

Week 7 Soft Skills

  • Advisory & Leadership Skills
  • Making an impact: data-driven story telling
  • Understanding your audience
  • Effective data presentation - getting your message across
  • Influence effectiveness and change leadership
  • Handling difficult situations


  • End of Programme graduation exam


Participants to have good grounding in maths, at least high school level.

Though programming skills are not required, any programming skills will be useful.

Participants will be assessed and interviewed prior to participation in this training programme.

  245 Hours


Related Courses


  14 hours

Accelerating Python Pandas Workflows with Modin

  14 hours

GPU Data Science with NVIDIA RAPIDS

  14 hours

Anaconda Ecosystem for Data Scientists

  14 hours

Big Data Business Intelligence for Telecom and Communication Service Providers

  35 hours

Data Science for Big Data Analytics

  35 hours

MATLAB Fundamentals, Data Science & Report Generation

  35 hours

Jupyter for Data Science Teams

  7 hours

F# for Data Science

  21 hours

Python Programming for Finance

  35 hours

Data Science essential for Marketing/Sales professionals

  21 hours

Research Methods and Professional Issues– Data science

  7 hours

A Practical Introduction to Data Science

  35 hours

Python in Data Science

  35 hours

Introduction to Data Science and AI using Python

  35 hours