Course Outline
Introduction:
- Apache Spark in Hadoop Ecosystem
- Short intro for python, scala
Basics (theory):
- Architecture
- RDD
- Transformation and Actions
- Stage, Task, Dependencies
Using Databricks environment understand the basics (hands-on workshop):
- Exercises using RDD API
- Basic action and transformation functions
- PairRDD
- Join
- Caching strategies
- Exercises using DataFrame API
- SparkSQL
- DataFrame: select, filter, group, sort
- UDF (User Defined Function)
- Looking into DataSet API
- Streaming
Using AWS environment understand the deployment (hands-on workshop):
- Basics of AWS Glue
- Understand differencies between AWS EMR and AWS Glue
- Example jobs on both environment
- Understand pros and cons
Extra:
- Introduction to Apache Airflow orchestration
Requirements
Programing skills (preferably python, scala)
SQL basics
Testimonials
His pace, was great. I loved the fact he went into theory too so that I understand WHY I would do the things he is asking.
Intelligent Medical Objects
Trainer adjusted the training slightly based on audience request, so throw some light on few diff topics that we have requested
Intelligent Medical Objects
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
The live examples that were given and showed the basic aspects of Spark.
Intelligent Medical Objects
This is a great class! I most appreciate that Andras explains very clearly what Spark is all about, where it came from, and what problems it is able to solve. Much better than other introductions I've seen that just dive into how to use it. Andras has a deep knowledge of the topic and explains things very well.
Intelligent Medical Objects
It was great to get an understanding of what is going on under the hood of Spark. Knowing what's going on under the hood helps to better understand why your code is or is not doing what you expect it to do. A lot of the training was hands on which is always great and the section on optimizations was exceptionally relevant to my current work which was nice.
Intelligent Medical Objects
It was very informative. I've had very little experience with Spark before and so far this course has provided a very good introduction to the subject.
Intelligent Medical Objects
The content and the knowledge .
Jobstreet.com Shared Services Sdn. Bhd.
Get to learn spark streaming , databricks and aws redshift