Course Outline


  • Spark NLP vs NLTK vs spaCy
  • Overview of Spark NLP features and architecture

Getting Started

  • Setup requirements
  • Installing Spark NLP
  • General concepts

Using Pre-trained Pipelines

  • Importing required modules
  • Default annotators
  • Loading a pipeline model
  • Transforming texts

Building NLP Pipelines

  • Understanding the pipeline API
  • Implementing NER models
  • Choosing embeddings
  • Using word, sentence, and universal embeddings

Classification and Inference

  • Document classification use cases
  • Sentiment analysis models
  • Training a document classifier
  • Using other machine learning frameworks
  • Managing NLP models
  • Optimizing models for low-latency inference


Summary and Next Steps


  • Familiarity with Apache Spark
  • Python programming experience


  • Data scientists
  • Developers
  14 Hours


Related Courses

Python and Spark for Big Data (PySpark)

  21 hours

Introduction to Graph Computing

  28 hours

Apache Spark MLlib

  35 hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

  21 hours

Spark for Developers

  21 hours

Hortonworks Data Platform (HDP) for Administrators

  21 hours

Magellan: Geospatial Analytics on Spark

  14 hours

Alluxio: Unifying Disparate Storage Systems

  7 hours

Apache Spark SQL

  7 hours

A Practical Introduction to Stream Processing

  21 hours

Big Data Analytics in Health

  21 hours

Apache Spark in the Cloud

  21 hours

Apache Spark Streaming with Scala

  21 hours

SMACK Stack for Data Science

  14 hours

Apache Spark Fundamentals

  21 hours