Course Outline


  • Graph databases and libraries

Understanding Graph Data

  • The graph as a data structure
  • Using vertices (dots) and edges (lines) to model real-world scenarios

Using Graph Databases to Model, Persist and Process Graph Data

  • Local graph algorithms/traversals
  • neo4j, OrientDB and Titan

Exercise: Modeling Graph Data with neo4j

  • Whiteboard data modeling

Beyond Graph Databases: Graph Computing

  • Understanding the property graph
  • Graph modeling different scenarios (software graph, discussion graph, concept graph)

Solving Real-World Problems with Traversals

  • Algorithmic/directed walk over the graph
  • Determining circular cependencies

Case Study: Ranking Discussion Contributors

  • Ranking by number and depth of contributed discussions
  • A note on sentiment and concept analysis

Graph Computing: Local, In-Memory Graph toolkits

  • Graph analysis and visualization
  • JUNG, NetworkX, and iGraph

Exercise: Modeling Graph Data with NetworkX

  • Using NetworkX to model a complex system

Graph Computing: Batch Processing Graph Frameworks

  • Leveraging Hadoop for storage (HDFS) and processing (MapReduce)
  • Overview of iterative algorithms
  • Hama, Giraph, and GraphLab

Graph Computing: Graph-Parallel Computation

  • Unifying ETL, exploratory analysis, and iterative graph computation within a single system
  • GraphX

Setup and Installation

  • Hadoop and Spark

GraphX Operators

  • Property, structural, join, neighborhood aggregation, caching and uncaching

Iterating with Pregel API

  • Passing arguments for sending, receiving and computing

Building a Graph

  • Using vertices and edges in an RDD or on disk

Designing Scalable Algorithms

  • GraphX Optimization

Accessing Additional Algorithms

  • PageRank, Connected Components, Triangle Counting

Exercis: Page Rank and Top Users

  • Building and processing graph data using text files as input

Deploying to Production

Closing Remarks


  • An undersanding of Java programming and frameworks
  • A general understanding of Python is helpful but not required
  • A general understanding of database concepts


  • Developers
  28 Hours


Related Courses

Apache Jena: Creating a Semantic Web Application

 21 hours

Apache Jena is an open source Java framework for building Semantic Web and Linked Data applications. In this instructor-led, live training, participants will learn how to use Apache Jena to build and deploy a Semantic Web Application.  By

Blazegraph: Creating a Graph Database Application

 21 hours

Blazegraph is an open source, Java-based RDF graph database for storing and representing data with complex relationships. It supports Blueprints and RDF/SPARQL 1.1. In this instructor-led, live training, participants will learn how to use

Flockdb: A Simple Graph Database for Social Media

 7 hours

FlockDB is an open source distributed, fault-tolerant graph database for managing wide but shallow network graphs. It was initially used by Twitter to store relationships among users. In this instructor-led, live training, participants will learn


 14 hours

JanusGraph is a graph database for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. This instructor-led, live training (online or onsite) is aimed at engineers who wish

Introduction to Semantic MediaWiki

 7 hours

MediaWiki is a free and open-source wiki software. This one-day course provides participants with an introduction to Semantic MediaWiki.

Beyond the Relational Database: Neo4j

 21 hours

Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly

Building Graph Databases with Neo4j AuraDB

 14 hours

Neo4j AuraDB is a fully-managed graph database service. It is fast, reliable, and fully-automated, making it easy to build graph database applications in the cloud. This instructor-led, live training (online or onsite) is aimed at developers who

Semantic Web Overview

 7 hours

The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. The Semantic Web provides a common framework that allows data to be shared and reused across


 14 hours

SPARQL is a query language for querying RDF (Resource Description Framework) data. It is similar to SQL for relational data in databases. This instructor-led, live training (online or onsite) is aimed at technical persons who wish to query