Get in Touch

Course Outline

Introduction

This module offers a comprehensive overview of when to utilize machine learning, the factors to consider, and the underlying concepts, including its advantages and limitations. Topics cover data types (structured/unstructured/static/streamed), data integrity and volume, the distinction between data-driven and user-driven analytics, and the comparison between statistical and machine learning models. It also addresses challenges in unsupervised learning, the bias-variance trade-off, iteration and evaluation processes, cross-validation techniques, and the differences between supervised, unsupervised, and reinforcement learning.

MAJOR TOPICS

1. Mastering Naive Bayes

  • Core concepts of Bayesian methods
  • Foundations of probability
  • Joint probability
  • Conditional probability using Bayes' theorem
  • The Naive Bayes algorithm
  • Naive Bayes classification
  • The Laplace estimator
  • Applying numeric features with Naive Bayes

2. Mastering Decision Trees

  • Divide and conquer strategies
  • The C5.0 decision tree algorithm
  • Selecting the optimal split
  • Pruning the decision tree

3. Mastering Neural Networks

  • Evolution from biological to artificial neurons
  • Activation functions
  • Network topology
  • Configuring the number of layers
  • Information flow direction
  • Determining node counts per layer
  • Training neural networks via backpropagation
  • Deep Learning

4. Mastering Support Vector Machines

  • Classification using hyperplanes
  • Identifying the maximum margin
  • Handling linearly separable data
  • Addressing non-linearly separable data
  • Utilizing kernels for non-linear spaces

5. Mastering Clustering

  • Clustering as a machine learning objective
  • The k-means clustering algorithm
  • Using distance metrics to assign and update clusters
  • Determining the optimal number of clusters

6. Evaluating Classification Performance

  • Handling classification prediction data
  • Deep dive into confusion matrices
  • Utilizing confusion matrices for performance assessment
  • Beyond accuracy – alternative performance metrics
  • The kappa statistic
  • Sensitivity and specificity
  • Precision and recall
  • The F-measure
  • Visualizing performance trade-offs
  • ROC curves
  • Estimating future performance
  • The holdout method
  • Cross-validation
  • Bootstrap sampling

7. Optimizing Standard Models for Improved Performance

  • Employing caret for automated parameter tuning
  • Constructing a basic tuned model
  • Customizing the tuning workflow
  • Enhancing model efficacy with meta-learning
  • Understanding ensemble methods
  • Bagging
  • Boosting
  • Random forests
  • Training random forests
  • Evaluating random forest performance

MINOR TOPICS

8. Understanding Nearest Neighbor Classification

  • The kNN algorithm
  • Distance calculation methods
  • Selecting an appropriate k value
  • Preparing data for kNN application
  • Why is the kNN algorithm considered lazy?

9. Understanding Classification Rules

  • Separate and conquer approach
  • The One Rule algorithm
  • The RIPPER algorithm
  • Deriving rules from decision trees

10. Understanding Regression

  • Simple linear regression
  • Ordinary least squares estimation
  • Correlations
  • Multiple linear regression

11. Understanding Regression Trees and Model Trees

  • Incorporating regression into trees

12. Understanding Association Rules

  • The Apriori algorithm for association rule learning
  • Assessing rule interest – support and confidence
  • Constructing a rule set using the Apriori principle

Extras

  • Spark/PySpark/MLlib and Multi-armed bandits

Requirements

Knowledge of Python

 21 Hours

Testimonials (7)

Upcoming Courses

Related Categories