Course Outline
spark.mllib: data types, algorithms, and utilities
- Data types
- Basic statistics
- summary statistics
- correlations
- stratified sampling
- hypothesis testing
- streaming significance testing
- random data generation
- Classification and regression
- linear models (SVMs, logistic regression, linear regression)
- naive Bayes
- decision trees
- ensembles of trees (Random Forests and Gradient-Boosted Trees)
- isotonic regression
- Collaborative filtering
- alternating least squares (ALS)
- Clustering
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- bisecting k-means
- streaming k-means
- Dimensionality reduction
- singular value decomposition (SVD)
- principal component analysis (PCA)
- Feature extraction and transformation
- Frequent pattern mining
- FP-growth
- association rules
- PrefixSpan
- Evaluation metrics
- PMML model export
- Optimization (developer)
- stochastic gradient descent
- limited-memory BFGS (L-BFGS)
spark.ml: high-level APIs for ML pipelines
- Overview: estimators, transformers and pipelines
- Extracting, transforming and selecting features
- Classification and regression
- Clustering
- Advanced topics
Requirements
Knowledge of one of the following:
- Java
- Scala
- Python
- SparkR.
Testimonials
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
A Practical Introduction to Stream Processing Course
This is one of the best quality online trainings I have ever taken in my 13 year career. Keep up the great work!
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP Course
No rigid approach to conducting training. Flexibility. Without unnecessary formalities, "Mr.", "Mrs.", "ą", "ę".
Beata Szylhabel, Krajowy Rejestr Długów Biuro Informacji Gospodarczej S.A.
Apache Spark Fundamentals Course
- training using practical examples. - very well prepared materials and environment for independent exercises - frequent suggestions/advice drawn from the trainer's practice.
Beata Szylhabel, Krajowy Rejestr Długów Biuro Informacji Gospodarczej S.A.
Apache Spark Fundamentals Course
The trainer's practical experience, not coloring the discussed solution, but also not introducing a negative connotation. I feel that the trainer is preparing me for real and practical use of the tool - these valuable details are usually not found in books.
Krzysztof Miodek - Beata Szylhabel, Krajowy Rejestr Długów Biuro Informacji Gospodarczej S.A.
Apache Spark Fundamentals Course
The live examples
Ahmet Bolat - Edina Kiss, Accenture Industrial SS
Python, Spark, and Hadoop for Big Data Course
I liked that it managed to lay the foundations of the topic and go to some quite advanced exercises. Also provided easy ways to write/test the code.
Ionut Goga - Edina Kiss, Accenture Industrial SS
Python, Spark, and Hadoop for Big Data Course
The fact that we were able to take with us most of the information/course/presentation/exercises done, so that we can look over them and perhaps redo what we didint understand first time or improve what we already did.
Raul Mihail Rat - Edina Kiss, Accenture Industrial SS
Python, Spark, and Hadoop for Big Data Course
The trainer was always open for questions and willing to answer and explain everything. He seems to have very good and deep knowledge of what he is teaching. We were able to focus more on topics that might bring value for us since we were only two students.
DEVK Deutsche Eisenbahn Versicherung Sach- und HUK-Versicherungsverein a.G.
Hadoop and Spark for Administrators Course
very interactive...
Richard Langford - Nadia Naidoo, Jembi Health Systems NPC
SMACK Stack for Data Science Course
Jorge was amazing- he is super knowledgeable and has a lot of Information to share.
Nadia Naidoo, Jembi Health Systems NPC
SMACK Stack for Data Science Course
Get to learn spark streaming , databricks and aws redshift
Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.
Apache Spark in the Cloud Course
The content and the knowledge .
Jobstreet.com Shared Services Sdn. Bhd.
Apache Spark in the Cloud Course
It was very informative. I've had very little experience with Spark before and so far this course has provided a very good introduction to the subject.
Intelligent Medical Objects
Apache Spark in the Cloud Course
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Big Data Analytics in Health Course
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
A Practical Introduction to Stream Processing Course
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
A Practical Introduction to Stream Processing Course
Sufficient hands on, trainer is knowledgable
Chris Tan
A Practical Introduction to Stream Processing Course
Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.
Thomas Carcaud - IT Frankfurt GmbH
Spark for Developers Course
Ajay is very personable and a pleasant speaker. He is nice and seems super knowledgeable in many of these areas. He made himself available and his github is a great resource!
credit karma
Spark for Developers Course
This is one of the best hands-on with exercises programming courses I have ever taken.
Laura Kahn
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP Course
This is one of the best quality online trainings I have ever taken in my 13 year career. Keep up the great work!