Get in Touch

Course Outline

Introduction

  • The Data Science Process
  • Roles and responsibilities of a Data Scientist

Preparing the Development Environment

  • Libraries, frameworks, languages, and tools
  • Local development setups
  • Collaborative web-based development environments

Data Collection

  • Types of Data
    • Structured Data
      • Local databases
      • Database connectors
      • Common formats: xlsx, XML, JSON, CSV, ...
    • Unstructured Data
      • Clicks, sensors, smartphones
      • APIs
      • Internet of Things (IoT)
      • Documents, images, videos, audio
  • Case Study: Continuously collecting large volumes of unstructured data

Data Storage

  • Relational databases
  • Non-relational databases
  • Hadoop: Distributed File System (HDFS)
  • Spark: Resilient Distributed Dataset (RDD)
  • Cloud storage solutions

Data Preparation

  • Ingestion, selection, cleansing, and transformation
  • Ensuring data quality: accuracy, relevance, and security
  • Exception reporting

Languages Used for Preparation, Processing, and Analysis

  • R Language
    • Introduction to R
    • Data manipulation, calculations, and graphical displays
  • Python
    • Introduction to Python
    • Manipulating, processing, cleaning, and analyzing data

Data Analytics

  • Exploratory Analysis
    • Basic statistics
    • Draft visualizations
    • Gaining data understanding
  • Causality
  • Feature engineering and transformations
  • Machine Learning
    • Supervised vs. unsupervised learning
    • Model selection criteria
  • Natural Language Processing (NLP)

Data Visualization

  • Best practices
  • Selecting the appropriate chart for the data
  • Color palettes
  • Advancing visualization techniques
    • Dashboards
    • Interactive visualizations
  • Data storytelling

Summary and Conclusion

Requirements

  • A general understanding of database concepts
  • A basic understanding of statistics
 35 Hours

Testimonials (1)

Upcoming Courses

Related Categories