Course Outline


  • Apache Arrow vs Parquet

Installing and Configuring Apache Arrow

Overview of Apache Arrow Features and Architecture

Exploring Data with Pandas and Apache Arrow

Exploring Data with Spark and Apache Arrow

Exploring Data with R and Apache Arrow

Exploring Data with MapD and Apache Arrow

Other Data Analysis Integrations

  • PySpark, Parquet files on S3, and Oracle tables and Elasticsearch indices


Summary and Conclusion


  • A basic undersanding of SQL
  • Familiarity with Python or R
  • Some familiarity with Apache Spark
  14 Hours


Related Courses

Automated Monitoring with Zabbix

 14 hours

This course focuses on practical implementation and tooling. This course covers the installation, planning and configuration of Zabbix


 14 hours

Azure Databricks is a unified data analytics platform that allows users to store and visualize vast amounts of data from different sources. It provides a collaborative environment to build, deploy, and manage data analytics workloads easily. This

Data Cleaning

 7 hours

Data Cleaning or Data Cleansing refers to the process of detecting and fixing issues in a data set before analyzing it. This instructor-led, live training (online or onsite) is aimed at data scientists, data analysts, and business analysts who

Datadog Monitoring

 7 hours

Datadog is a monitoring platform for cloud-based applications that provides tools for monitoring servers and databases. It helps determine performance metrics and perform event monitoring for infrastructure and cloud-based services. This


 7 hours

Netdata is an open-source infrastructure performance monitoring and troubleshooting solution that simplifies real-time data collection of system, hardware, and application metrics. Netdata helps users visualize and store data, set performance issue

Zenoss Monitoring for Administrators

 21 hours

Zenoss Community Edition is an application, server, and network management platform for monitoring availability, inventory/configuration, performance, and events. It is based on the Zope application server. This instructor-led, live training

Fluentd for Log Data Unification

 14 hours

This instructor-led, live training (online or onsite) is aimed at engineers who wish to set up an architecture where everything is logged. By the end of this training, participants will be able to: Install and configure Fluentd. Collect

KNIME Analytics Platform for BI

 21 hours

KNIME Analytics Platform is a leading open source option for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. With more than 1000 modules, hundreds of ready-to-run

Microsoft Power Platform Fundamentals

 14 hours

Microsoft Power Platform is a platform made up of three Microsoft products: Power BI, PowerApps, and Power Automate. These products use low-code programming to help users build simple apps, create automated workflows, and generate business

Monitoring Your Resources with Munin

 7 hours

Munin is an open-source monitoring tool that helps system administrators monitor resources such as servers, workstations, networks, SANs, applications, network devices, etc. It shows resource trends and provides insights into questions such as


 35 hours

The 5-day course demonstrates through hands-on practice the fundamentals of Nagios.

Nagios Core

 21 hours

This course covers the installation, planning and configuration of Nagios Core The level of this course is Intermediate

Nagios XI Administration

 21 hours

Nagios XI is enterprise server and network monitoring software. In this instructor-led, live training, participants will learn how to set up and operate Nagios XI as they step through process of managing Linux and Windows servers in a series

Sensu: Beginner to Advanced

 14 hours

Sensu is a telemetry and monitoring service for multi-cloud infrastructures at scale. Sensu is aimed at dynamic infrastructures that require a change in approach to monitoring systems that traditional monitoring systems cannot provide. This

SPSS Modeler

 14 hours

IBM SPSS Modeler is a software used for data mining and text analytics. It provides a set of data mining tools that can build predictive models and perform data analytic tasks. This instructor-led, live training (online or onsite) is aimed at