Course Outline

Introduction

SMACK Stack Overview

  • What is Apache Spark? Apache Spark features
  • What is Apache Mesos? Apache Mesos features
  • What is Apache Akka? Apache Akka features
  • What is Apache Cassandra? Apache Cassandra features
  • What is Apache Kafka? Apache Kafka features

Scala Language

  • Scala syntax and structure
  • Scala control flow

Preparing the Development Environment

  • Installing and configuring the SMACK stack
  • Installing and configuring Docker

Apache Akka

  • Using actors

Apache Cassandra

  • Creating a database for read operations
  • Working with backups and recovery

Connectors

  • Creating a stream
  • Building an Akka application
  • Storing data with Cassandra
  • Reviewing connectors

Apache Kafka

  • Working with clusters
  • Creating, publishing, and consuming messages

Apache Mesos

  • Allocating resources
  • Running clusters
  • Working with Apache Aurora and Docker
  • Running services and jobs
  • Deploying Spark, Cassandra, and Kafka on Mesos

Apache Spark

  • Managing data flows
  • Working with RDDs and dataframes
  • Performing data analysis

Troubleshooting

  • Handling failure of services and errors

Summary and Conclusion

Requirements

  • An understanding of data processing systems

Audience

  • Data Scientists
  14 Hours
 

Testimonials

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

 21 hours

This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and

Apache Spark MLlib

 35 hours

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative

Anaconda Ecosystem for Data Scientists

 14 hours

Anaconda is a free distribution of Python and R programming languages for data science. It provides an easy-to-use platform that simplifies package management and deployment. This instructor-led, live training (online or onsite) is aimed at data

Big Data Business Intelligence for Telecom and Communication Service Providers

 35 hours

Overview Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow

Data Science Programme

 245 hours

The explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills

Data Science for Big Data Analytics

 35 hours

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer,

Jupyter for Data Science Teams

 7 hours

Jupyter is an open-source, web-based interactive IDE and computing environment. This instructor-led, live training introduces the idea of collaborative development in data science and demonstrates how to use Jupyter to track and participate as a

MATLAB Fundamentals, Data Science & Report Generation

 35 hours

In the first part of this training, we cover the fundamentals of MATLAB and its function as both a language and a platform.  Included in this discussion is an introduction to MATLAB syntax, arrays and matrices, data visualization, script

Python Programming for Finance

 35 hours

Python is a programming language that has gained huge popularity in the financial industry. Adopted by the largest investment banks and hedge funds, it is being used to build a wide range of financial applications ranging from core trading programs

F# for Data Science

 21 hours

Data science is the application of statistical analysis, machine learning, data visualization and programming for the purpose of understanding and interpreting real-world data. F# is a well suited programming language for data science as it combines

Introduction to Graph Computing

 28 hours

Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set

Kaggle

 14 hours

Kaggle is a crowd-sourced platform for data scientists. It provides a platform for users to find and publish high-quality datasets, explore and build models in a web-based data-science environment, and work with other data scientists and machine

Accelerating Python Pandas Workflows with Modin

 14 hours

Modin is a parallel data frame system designed to speed up Pandas workflows. It can be used to handle large datasets, leveraging Ray or Dask as the backend framework for distributed computing in Python. This instructor-led, live training (online

GPU Data Science with NVIDIA RAPIDS

 14 hours

RAPIDS is a suite of open source software libraries built to accelerate GPU-driven data science and analytics pipelines. It is based on Python and includes a DataFrame API that integrates with a variety of machine learning algorithms. This

Python and Spark for Big Data (PySpark)

 21 hours

Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this