Course Outline

Quick Overview

  • Data Sources
  • Minding Data
  • Recommender systems
  • Target Marketing


  • Structured vs unstructured
  • Static vs streamed
  • Attitudinal, behavioural and demographic data
  • Data-driven vs user-driven analytics
  • data validity
  • Volume, velocity and variety of data


  • Building models
  • Statistical Models
  • Machine learning

Data Classification

  • Clustering
  • kGroups, k-means, the nearest neighbours
  • Ant colonies, birds flocking

Predictive Models

  • Decision trees
  • Support vector machine
  • Naive Bayes classification
  • Neural networks
  • Markov Model
  • Regression
  • Ensemble methods


  • Benefit/Cost ratio
  • Cost of software
  • Cost of development
  • Potential benefits

Building Models

  • Data Preparation (MapReduce)
  • Data cleansing
  • Choosing methods
  • Developing model
  • Testing Model
  • Model evaluation
  • Model deployment and integration

Overview of Open Source and commercial software

  • Selection of R-project package
  • Python libraries
  • Hadoop and Mahout
  • Selected Apache projects related to Big Data and Analytics
  • Selected commercial solution
  • Integration with existing software and data sources


Understanding of traditional data management and analysis methods like SQL, data warehouses, business intelligence, OLAP, etc... Understanding of basic statistics and probability (mean, variance, probability, conditional probability, etc....)

  21 Hours


Related Courses

Data Virtualization with Denodo Platform

  14 hours

Apache Airflow

  21 hours

Apache Arrow for Data Analysis across Disparate Data Sources

  14 hours

Apache Hama

  14 hours

Zeppelin for Interactive Data Analytics

  14 hours

Apache Accumulo Fundamentals

  21 hours

Apache Kylin: From Classic OLAP to Real-Time Data Warehouse

  14 hours

Dremio for Self-Service Data Analysis

  21 hours

Apache Drill

  21 hours

Apache Drill Performance Optimization and Debugging

  7 hours

Apache Drill Query Optimization

  7 hours

Data Vault: Building a Scalable Data Warehouse

  28 hours

Big Data & Database Systems Fundamentals

  14 hours

Apache Druid for Real-Time Data Analysis

  21 hours