Course Outline

Introduction

Overview of Data Access Approaches (Hive, databases, etc.)

Overview of Spark Features and Architecture

Installing and Configuring Spark

Understanding Dataframes in Spark

Defining Tables and Importing Datasets

Querying Data Frames using SQL

Carrying out Aggregations, JOINs and Nested Queries

Uploading and Accessing Data

Querying Different Types of Data

  • JSON, Parquet, etc.

Querying Data Lakes with SQL

Troubleshooting

Summary and Conclusion

Requirements

  • Experience with SQL queries
  • Programming experience in any language

Audience

  • Data analysts
  • Data scientists
  • Data engineers
  7 Hours
 

Testimonials

Related Courses

Python and Spark for Big Data (PySpark)

  21 hours

Introduction to Graph Computing

  28 hours

Apache Spark MLlib

  35 hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

  21 hours

Spark for Developers

  21 hours

Hortonworks Data Platform (HDP) for Administrators

  21 hours

Magellan: Geospatial Analytics on Spark

  14 hours

Alluxio: Unifying Disparate Storage Systems

  7 hours

A Practical Introduction to Stream Processing

  21 hours

Big Data Analytics in Health

  21 hours

Apache Spark in the Cloud

  21 hours

Apache Spark Streaming with Scala

  21 hours

SMACK Stack for Data Science

  14 hours

Apache Spark Fundamentals

  21 hours

Apache Spark for .NET Developers

  21 hours