Apache Spark MLlib Training Course

MLlib serves as the machine learning (ML) library for Spark, aiming to make scalable and user-friendly practical machine learning accessible. It encompasses various learning algorithms and utilities such as classification, regression, clustering, collaborative filtering, dimensionality reduction, along with foundational optimization tools and advanced pipeline APIs.

The library is divided into two main packages:

spark.mllib includes the original API constructed using RDDs.
spark.ml offers a more sophisticated API based on DataFrames, facilitating the creation of ML pipelines.

Audience

This course is tailored for engineers and developers looking to leverage an integrated Machine Learning Library within Apache Spark.

This course is available as onsite live training in United Arab Emirates or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

spark.mllib: data types, algorithms, and utilities

Data types
Basic statistics
- summary statistics
- correlations
- stratified sampling
- hypothesis testing
- streaming significance testing
- random data generation
Classification and regression
- linear models (SVMs, logistic regression, linear regression)
- naive Bayes
- decision trees
- ensembles of trees (Random Forests and Gradient-Boosted Trees)
- isotonic regression
Collaborative filtering
- alternating least squares (ALS)
Clustering
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- bisecting k-means
- streaming k-means
Dimensionality reduction
- singular value decomposition (SVD)
- principal component analysis (PCA)
Feature extraction and transformation
Frequent pattern mining
- FP-growth
- association rules
- PrefixSpan
Evaluation metrics
PMML model export
Optimization (developer)
- stochastic gradient descent
- limited-memory BFGS (L-BFGS)

spark.ml: high-level APIs for ML pipelines

Overview: estimators, transformers and pipelines
Extracting, transforming and selecting features
Classification and regression
Clustering
Advanced topics

Requirements

Knowledge of one of the following:

Java
Scala
Python
SparkR.

35 Hours

Need help picking the right course?

Testimonials (1)

A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution

Apache Spark MLlib Training Course

Course Outline

spark.mllib: data types, algorithms, and utilities

spark.ml: high-level APIs for ML pipelines

Requirements

Testimonials (1)

Rafal - Nordea

Course - Apache Spark MLlib

Upcoming Courses

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Apache Spark MLlib Training Course

Course Outline

spark.mllib: data types, algorithms, and utilities

spark.ml: high-level APIs for ML pipelines

Requirements

Testimonials (1)

Rafal - Nordea

Course - Apache Spark MLlib

Upcoming Courses

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

Related Categories

Apache Spark MLlib

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites