Course Outline

Introduction

  • The Data Science Process
  • Roles and responsibilities of a Data Scientist

Preparing the Development Environment

  • Libraries, frameworks, languages and tools
  • Local development
  • Collaborative web-based development

Data Collection

  • Different Types of Data
    • Structured 
      • Local databases
      • Database connectors
      • Common formats: xlxs, XML, Json, csv, ...
    • Un-Structured
      • Clicks, censors, smartphones
      • APIs
      • Internet of Things (IoT)
      • Documents, pictures, videos, sounds
  • Case study: Collecting large amounts of unstructured data continuosly

Data Storage

  • Relational databases
  • Non-relational databases
  • Hadoop: Distributed File System (HDFS)
  • Spark: Resilient Distributed Dataset (RDD)
  • Cloud storage

Data Preparation

  • Ingestion, selection, cleansing, and transformation
  • Ensuring data quality - correctness, meaningfulness, and security
  • Exception reports

Languages used for Preparation, Processing and Analysis

  • R language
    • Introduction to R
    • Data manipulation, calculation and graphical display
  • Python
    • Introduction to Python
    • Manipulating, processing, cleaning, and crunching data

Data Analytics

  • Exploratory analysis
    • Basic statistics
    • Draft visualizations
    • Understand data 
  • Causality
  • Features and transformations
  • Machine Learning
    • Supervised vs unsurpevised
    • When to use what model
  • Natural Language Processing (NLP)

Data Visualization

  • Best Practices
  • Selecting the right chart for the right data
  • Color pallets
  • Taking it to the next level
    • Dashboards
    • Interactive Visualizations
  • Storytelling with data

Summary and Conclusion

Requirements

  • A general understanding of database concepts
  • A basic understanding of statistics
  35 Hours
 

Testimonials

Related Courses

Kaggle

  14 hours

Accelerating Python Pandas Workflows with Modin

  14 hours

GPU Data Science with NVIDIA RAPIDS

  14 hours

Anaconda Ecosystem for Data Scientists

  14 hours

SPSS Modeler

  14 hours

Databricks

  14 hours

Microsoft Power Platform Fundamentals

  14 hours

PL-900T00: Microsoft Power Platform Fundamentals

  7 hours

Data Cleaning

  7 hours

Sensu: Beginner to Advanced

  14 hours

Monitoring Your Resources with Munin

  7 hours

Automated Monitoring with Zabbix

  14 hours

Fluentd for Log Data Unification

  14 hours

Nagios Core

  21 hours

Nagios

  35 hours