Course Outline

Introduction to Programming Big Data with R (bpdR)

  • Setting up your environment to use pbdR
  • Scope and tools available in pbdR
  • Packages commonly used with Big Data alongside pbdR

Message Passing Interface (MPI)

  • Using pbdR MPI 5
  • Parallel processing
  • Point-to-point communication
  • Send Matrices
  • Summing Matrices
  • Collective communication
  • Summing Matrices with Reduce
  • Scatter / Gather
  • Other MPI communications

Distributed Matrices

  • Creating a distributed diagonal matrix
  • SVD of a distributed matrix
  • Building a distributed matrix in parallel

Statistics Applications

  • Monte Carlo Integration
  • Reading Datasets
  • Reading on all processes
  • Broadcasting from one process
  • Reading partitioned data
  • Distributed Regression
  • Distributed Bootstrap
  21 Hours
 

Testimonials

Related Courses

Introduction to Data Visualization with Tidyverse and R

  7 hours

Data Virtualization with Denodo Platform

  14 hours

Apache Airflow

  21 hours

Apache Arrow for Data Analysis across Disparate Data Sources

  14 hours

Apache Hama

  14 hours

Zeppelin for Interactive Data Analytics

  14 hours

Apache Accumulo Fundamentals

  21 hours

Apache Kylin: From Classic OLAP to Real-Time Data Warehouse

  14 hours

Dremio for Self-Service Data Analysis

  21 hours

Apache Drill

  21 hours

Apache Drill Performance Optimization and Debugging

  7 hours

Apache Drill Query Optimization

  7 hours

Data Vault: Building a Scalable Data Warehouse

  28 hours

Big Data & Database Systems Fundamentals

  14 hours

Apache Druid for Real-Time Data Analysis

  21 hours