Spark for Developers Training Course

OBJECTIVE:

This course will provide an introduction to Apache Spark. Students will learn how Spark integrates into the Big Data ecosystem and how to utilize Spark for data analysis. The curriculum includes using the Spark shell for interactive data analysis, understanding Spark's internal workings, working with Spark APIs, leveraging Spark SQL, implementing Spark streaming, and applying machine learning and graphX.

AUDIENCE :

Developers / Data Analysts

This course is available as onsite live training in United Arab Emirates or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Scala primer
- A quick introduction to Scala
- Labs : Getting know Scala
Spark Basics
- Background and history
- Spark and Hadoop
- Spark concepts and architecture
- Spark eco system (core, spark sql, mlib, streaming)
- Labs : Installing and running Spark
First Look at Spark
- Running Spark in local mode
- Spark web UI
- Spark shell
- Analyzing dataset – part 1
- Inspecting RDDs
- Labs: Spark shell exploration
RDDs
- RDDs concepts
- Partitions
- RDD Operations / transformations
- RDD types
- Key-Value pair RDDs
- MapReduce on RDD
- Caching and persistence
- Labs : creating & inspecting RDDs; Caching RDDs
Spark API programming
- Introduction to Spark API / RDD API
- Submitting the first program to Spark
- Debugging / logging
- Configuration properties
- Labs : Programming in Spark API, Submitting jobs
Spark SQL
- SQL support in Spark
- Dataframes
- Defining tables and importing datasets
- Querying data frames using SQL
- Storage formats : JSON / Parquet
- Labs : Creating and querying data frames; evaluating data formats
MLlib
- MLlib intro
- MLlib algorithms
- Labs : Writing MLib applications
GraphX
- GraphX library overview
- GraphX APIs
- Labs : Processing graph data using Spark
Spark Streaming
- Streaming overview
- Evaluating Streaming platforms
- Streaming operations
- Sliding window operations
- Labs : Writing spark streaming applications
Spark and Hadoop
- Hadoop Intro (HDFS / YARN)
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
Spark Performance and Tuning
- Broadcast variables
- Accumulators
- Memory management & caching
Spark Operations
- Deploying Spark in production
- Sample deployment templates
- Configurations
- Monitoring
- Troubleshooting

Requirements

PRE-REQUISITES

familiarity with either Java / Scala / Python language (our labs in Scala and Python)
basic understanding of Linux development environment (command line navigation / editing files using VI or nano)

21 Hours

Need help picking the right course?

Testimonials (6)

Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.

Thomas Carcaud - IT Frankfurt GmbH

Course - Spark for Developers

Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.

Biniam Guulay - ICE International Copyright Enterprise Germany GmbH

Course - Spark for Developers

Ernesto did a great job explaining the high level concepts of using Spark and its various modules.

Michael Nemerouf

Course - Spark for Developers

The trainer made the class interesting and entertaining which helps quite a bit with all day training.

Ryan Speelman

Course - Spark for Developers

We know a lot more about the whole environment.

John Kidd

Course - Spark for Developers

Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.

Spark for Developers Training Course

OBJECTIVE:

AUDIENCE :

Course Outline

Scala primer

Spark Basics

First Look at Spark

RDDs

Spark API programming

Spark SQL

MLlib

GraphX

Spark Streaming

Spark and Hadoop

Spark Performance and Tuning

Spark Operations

Requirements

Testimonials (6)

Thomas Carcaud - IT Frankfurt GmbH

Course - Spark for Developers

Biniam Guulay - ICE International Copyright Enterprise Germany GmbH

Course - Spark for Developers

Michael Nemerouf

Course - Spark for Developers

Ryan Speelman

Course - Spark for Developers

John Kidd

Course - Spark for Developers

Kieran Mac Kenna

Course - Spark for Developers

Upcoming Courses

Spark for Developers

Spark for Developers

Spark for Developers

Spark for Developers

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

Big Data Analytics in Health

Introduction to Graph Computing

Hadoop and Spark for Administrators

Hortonworks Data Platform (HDP) for Administrators

A Practical Introduction to Stream Processing

SMACK Stack for Data Science

Apache Spark Fundamentals

Administration of Apache Spark

Apache Spark in the Cloud

Scaling Data Pipelines with Spark NLP

Python and Spark for Big Data (PySpark)

Python, Spark, and Hadoop for Big Data

Apache Spark SQL

Apache Spark MLlib

Related Categories

Apache Spark

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites