Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- The Data Science Process
- Roles and responsibilities of a Data Scientist
Preparing the Development Environment
- Libraries, frameworks, languages, and tools
- Local development setups
- Collaborative web-based development environments
Data Collection
-
Types of Data
-
Structured Data
- Local databases
- Database connectors
- Common formats: xlsx, XML, JSON, CSV, ...
-
Unstructured Data
- Clicks, sensors, smartphones
- APIs
- Internet of Things (IoT)
- Documents, images, videos, audio
-
Structured Data
- Case Study: Continuously collecting large volumes of unstructured data
Data Storage
- Relational databases
- Non-relational databases
- Hadoop: Distributed File System (HDFS)
- Spark: Resilient Distributed Dataset (RDD)
- Cloud storage solutions
Data Preparation
- Ingestion, selection, cleansing, and transformation
- Ensuring data quality: accuracy, relevance, and security
- Exception reporting
Languages Used for Preparation, Processing, and Analysis
-
R Language
- Introduction to R
- Data manipulation, calculations, and graphical displays
-
Python
- Introduction to Python
- Manipulating, processing, cleaning, and analyzing data
Data Analytics
-
Exploratory Analysis
- Basic statistics
- Draft visualizations
- Gaining data understanding
- Causality
- Feature engineering and transformations
-
Machine Learning
- Supervised vs. unsupervised learning
- Model selection criteria
- Natural Language Processing (NLP)
Data Visualization
- Best practices
- Selecting the appropriate chart for the data
- Color palettes
-
Advancing visualization techniques
- Dashboards
- Interactive visualizations
- Data storytelling
Summary and Conclusion
Requirements
- A general understanding of database concepts
- A basic understanding of statistics
35 Hours
Testimonials (1)
workshops, practical examples