Course Outline
Introduction to Machine Learning
- Types of machine learning – supervised vs unsupervised.
- The evolution from statistical learning to machine learning.
- The data mining workflow: business understanding, data preparation, modeling, and deployment.
- Selecting the appropriate algorithm for specific tasks.
- Overfitting and the bias-variance tradeoff.
Overview of Python and ML Libraries
- The rationale for using programming languages in ML.
- Comparing R and Python.
- Python crash course and Jupyter Notebooks.
- Essential Python libraries: pandas, NumPy, scikit-learn, matplotlib, and seaborn.
Testing and Evaluating ML Algorithms
- Generalization, overfitting, and model validation.
- Evaluation strategies: holdout, cross-validation, and bootstrapping.
- Metrics for regression: ME, MSE, RMSE, and MAPE.
- Metrics for classification: accuracy, confusion matrix, and handling unbalanced classes.
- Visualizing model performance: profit curve, ROC curve, and lift curve.
- Model selection and grid search for hyperparameter tuning.
Data Preparation
- Importing and storing data in Python.
- Exploratory analysis and summary statistics.
- Managing missing values and outliers.
- Standardization, normalization, and transformation techniques.
- Recoding qualitative data and data wrangling with pandas.
Classification Algorithms
- Distinguishing between binary and multiclass classification.
- Logistic regression and discriminant functions.
- Naïve Bayes and k-nearest neighbors.
- Decision trees: CART, Random Forests, Bagging, Boosting, and XGBoost.
- Support Vector Machines and kernel methods.
- Ensemble learning techniques.
Regression and Numerical Prediction
- Least squares and variable selection.
- Regularization methods: L1 and L2.
- Polynomial regression and nonlinear models.
- Regression trees and splines.
Neural Networks
- Introduction to neural networks and deep learning.
- Activation functions, layers, and backpropagation.
- Multilayer perceptrons (MLP).
- Utilizing TensorFlow or PyTorch for fundamental neural network modeling.
- Applying neural networks to classification and regression tasks.
Sales Forecasting and Predictive Analytics
- Differentiating between time series and regression-based forecasting.
- Managing seasonal and trend-based data.
- Constructing a sales forecasting model using ML techniques.
- Assessing forecast accuracy and uncertainty.
- Interpreting and communicating results to business stakeholders.
Unsupervised Learning
- Clustering techniques: k-means, k-medoids, hierarchical clustering, and SOMs.
- Dimensionality reduction: PCA, factor analysis, and SVD.
- Multidimensional scaling.
Text Mining
- Text preprocessing and tokenization.
- Bag-of-words, stemming, and lemmatization.
- Sentiment analysis and word frequency analysis.
- Visualizing text data using word clouds.
Recommendation Systems
- User-based and item-based collaborative filtering.
- Designing and evaluating recommendation engines.
Association Pattern Mining
- Frequent itemsets and the Apriori algorithm.
- Market basket analysis and lift ratio.
Outlier Detection
- Extreme value analysis.
- Distance-based and density-based methods.
- Outlier detection in high-dimensional data.
Machine Learning Case Study
- Defining the business problem.
- Data preprocessing and feature engineering.
- Model selection and parameter tuning.
- Evaluation and presentation of findings.
- Deployment.
Summary and Next Steps
Requirements
- Foundational understanding of machine learning concepts, including supervised and unsupervised learning.
- Proficiency in Python programming (variables, loops, functions).
- Prior experience with data handling using libraries such as pandas or NumPy is beneficial but not mandatory.
- No previous exposure to advanced modeling or neural networks is assumed.
Target Audience
- Data scientists.
- Business analysts.
- Software engineers and technical professionals involved with data.
Testimonials (2)
the ML ecosystem not only MLFlow but Optuna, hyperops, docker , docker-compose
Guillaume GAUTIER - OLEA MEDICAL
Course - MLflow
I enjoyed participating in the Kubeflow training, which was held remotely. This training allowed me to consolidate my knowledge for AWS services, K8s, all the devOps tools around Kubeflow which are the necessary bases to properly tackle the subject. I wanted to thank Malawski Marcin for his patience and professionalism for training and advice on best practices. Malawski approaches the subject from different angles, different deployment tools Ansible, EKS kubectl, Terraform. Now I am definitely convinced that I am going into the right field of application.