Course Outline

Introduction

  • Apache Spark vs Hadoop MapReduce

Overview of Apache Spark Features and Architecture

Choosing a Programming Language

Setting up Apache Spark

Creating a Sample Application

Choosing the Data Set

Running Data Analysis on the Data

Processing of Structured Data with Spark SQL

Processing Streaming Data with Spark Streaming

Integrating Apache Spark with 3rd Part Machine Learning Tools

Using Apache Spark for Graph Processing

Optimizing Apache Spark

Troubleshooting

Summary and Conclusion

Requirements

  • Experience with the Linux command line
  • A general understanding of data processing
  • Programming experience with Java, Scala, Python, or R

Audience

  • Developers
  21 Hours
 

Testimonials

Related Courses

Python and Spark for Big Data (PySpark)

  21 hours

Introduction to Graph Computing

  28 hours

Apache Spark MLlib

  35 hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

  21 hours

Spark for Developers

  21 hours

Hortonworks Data Platform (HDP) for Administrators

  21 hours

Magellan: Geospatial Analytics on Spark

  14 hours

Alluxio: Unifying Disparate Storage Systems

  7 hours

Apache Spark SQL

  7 hours

A Practical Introduction to Stream Processing

  21 hours

Big Data Analytics in Health

  21 hours

Apache Spark in the Cloud

  21 hours

Apache Spark Streaming with Scala

  21 hours

SMACK Stack for Data Science

  14 hours

Apache Spark for .NET Developers

  21 hours