Course Outline
- Machine Learning introduction
- Types of Machine learning – supervised vs unsupervised learning
- From Statistical learning to Machine learning
- The Data Mining workflow:
- Business understanding
- Data Understanding
- Data preparation
- Modelling
- Evaluation
- Deployment
- Machine learning algorithms
- Choosing appropriate algorithm to the problem
- Overfitting and bias-variance tradeoff in ML
- ML libraries and programming languages
- Why use a programming language
- Choosing between R and Python
- Python crash course
- Python resources
- Python Libraries for Machine learning
- Jupyter notebooks and interactive coding
- Testing ML algorithms
- Generalization and overfitting
- Avoiding overfitting
- Holdout method
- Cross-Validation
- Bootstrapping
- Evaluating numerical predictions
- Measures of accuracy: ME, MSE, RMSE, MAPE
- Parameter and prediction stability
- Evaluating classification algorithms
- Accuracy and its problems
- The confusion matrix
- Unbalanced classes problem
- Visualizing model performance
- Profit curve
- ROC curve
- Lift curve
- Model selection
- Model tuning – grid search strategies
- Examples in Python
- Data preparation
- Data import and storage
- Understand the data – basic explorations
- Data manipulations with pandas library
- Data transformations – Data wrangling
- Exploratory analysis
- Missing observations – detection and solutions
- Outliers – detection and strategies
- Standarization, normalization, binarization
- Qualitative data recoding
- Examples in Python
- Classification
- Binary vs multiclass classification
- Classification via mathematical functions
- Linear discriminant functions
- Quadratic discriminant functions
- Logistic regression and probability approach
- k-nearest neighbors
- Naïve Bayes
- Decision trees
- CART
- Bagging
- Random Forests
- Boosting
- Xgboost
- Support Vector Machines and kernels
- Maximal Margin Classifier
- Support Vector Machine
- Ensemble learning
- Examples in Python
- Regression and numerical prediction
- Least squares estimation
- Variables selection techniques
- Regularization and stability- L1, L2
- Nonlinearities and generalized least squares
- Polynomial regression
- Regression splines
- Regression trees
- Examples in Python
- Unsupervised learning
- Clustering
- Centroid-based clustering – k-means, k-medoids, PAM, CLARA
- Hierarchical clustering – Diana, Agnes
- Model-based clustering - EM
- Self organising maps
- Clusters evaluation and assessment
- Dimensionality reduction
- Principal component analysis and factor analysis
- Singular value decomposition
- Multidimensional Scaling
- Examples in Python
- Clustering
- Text mining
- Preprocessing data
- The bag-of-words model
- Stemming and lemmization
- Analyzing word frequencies
- Sentiment analysis
- Creating word clouds
- Examples in Python
- Recommendations engines and collaborative filtering
- Recommendation data
- User-based collaborative filtering
- Item-based collaborative filtering
- Examples in Python
- Association pattern mining
- Frequent itemsets algorithm
- Market basket analysis
- Examples in Python
- Outlier Analysis
- Extreme value analysis
- Distance-based outlier detection
- Density-based methods
- High-dimensional outlier detection
- Examples in Python
- Machine Learning case study
- Business problem understanding
- Data preprocessing
- Algorithm selection and tuning
- Evaluation of findings
- Deployment
Requirements
Knowledge and awareness of Machine Learning fundamentals
Testimonials (3)
Even with having to miss a day due to customer meetings, I feel I have a much clearer understanding of the processes and techniques used in Machine Learning and when I would use one approach over another. Our challenge now is to practice what we have learned and start to apply it to our problem domain
Richard Blewett - Rock Solid Knowledge Ltd
Course - Machine Learning – Data science
I like that training was focused on examples and coding. I thought that it is impossible to pack so much content into three days of training, but I was wrong. Training covered many topics and everything was done in a very detailed manner (especially tuning of model's parameters - I didn't expected that there will be a time for this and I was gratly surprised).
Bartosz Rosiek - GE Medical Systems Polska Sp. Zoo
Course - Machine Learning – Data science
It is showing many methods with pre prepared scripts- very nicely prepared materials & easy to traceback