Course Outline
Introduction
SMACK Stack Overview
- What is Apache Spark? Apache Spark features
- What is Apache Mesos? Apache Mesos features
- What is Apache Akka? Apache Akka features
- What is Apache Cassandra? Apache Cassandra features
- What is Apache Kafka? Apache Kafka features
Scala Language
- Scala syntax and structure
- Scala control flow
Preparing the Development Environment
- Installing and configuring the SMACK stack
- Installing and configuring Docker
Apache Akka
- Using actors
Apache Cassandra
- Creating a database for read operations
- Working with backups and recovery
Connectors
- Creating a stream
- Building an Akka application
- Storing data with Cassandra
- Reviewing connectors
Apache Kafka
- Working with clusters
- Creating, publishing, and consuming messages
Apache Mesos
- Allocating resources
- Running clusters
- Working with Apache Aurora and Docker
- Running services and jobs
- Deploying Spark, Cassandra, and Kafka on Mesos
Apache Spark
- Managing data flows
- Working with RDDs and dataframes
- Performing data analysis
Troubleshooting
- Handling failure of services and errors
Summary and Conclusion
Requirements
- An understanding of data processing systems
Audience
- Data Scientists
Testimonials
Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.
Kieran Mac Kenna
share concept diagram and also sample for hands dirty
Mark Yang - FMR
Applicable scenarios and cases
zhaopeng liu - Fmr
case analysis
国栋 张
all parts of this session
Eric Han - Fmr
We know a lot more about the whole environment.
John Kidd
The trainer made the class interesting and entertaining which helps quite a bit with all day training.
Ryan Speelman
I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.
Ernesto did a great job explaining the high level concepts of using Spark and its various modules.
Michael Nemerouf
Richard was very willing to digress when we wanted to ask semi-related questions about things not on the syllabus. Explanations were clear and he was up front about caveats in any advice he gave us.
- ARM Limited
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Small group (4 trainees) and we could progress together. Also the trainer could so help everybody.
- ICE International Copyright Enterprise Germany GmbH
Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.
Biniam Guulay - ICE International Copyright Enterprise Germany GmbH
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.
Thomas Carcaud - IT Frankfurt GmbH
get to learn spark streaming , databricks and aws redshift
Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.
The content and the knowledge .
Jobstreet.com Shared Services Sdn. Bhd.
It was very informative. I've had very little experience with Spark before and so far this course has provided a very good introduction to the subject.
Intelligent Medical Objects
It was great to get an understanding of what is going on under the hood of Spark. Knowing what's going on under the hood helps to better understand why your code is or is not doing what you expect it to do. A lot of the training was hands on which is always great and the section on optimizations was exceptionally relevant to my current work which was nice.
Intelligent Medical Objects
This is a great class! I most appreciate that Andras explains very clearly what Spark is all about, where it came from, and what problems it is able to solve. Much better than other introductions I've seen that just dive into how to use it. Andras has a deep knowledge of the topic and explains things very well.
Intelligent Medical Objects
The live examples that were given and showed the basic aspects of Spark.
Intelligent Medical Objects
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Trainer adjusted the training slightly based on audience request , so throw some light on few diff topics that we have requested
Intelligent Medical Objects
His pace, was great. I loved the fact he went into theory too so that I understand WHY i would do the things he is asking.
Intelligent Medical Objects
I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Related Courses
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 hoursThis course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib
35 hoursMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Anaconda Ecosystem for Data Scientists
14 hoursAnaconda is a free distribution of Python and R programming languages for data science. It provides an easy-to-use platform that simplifies package management and deployment. This instructor-led, live training (online or onsite) is aimed at data
Big Data Business Intelligence for Telecom and Communication Service Providers
35 hoursOverview Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow
Data Science Programme
245 hoursThe explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills
Data Science for Big Data Analytics
35 hoursBig data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer,
Jupyter for Data Science Teams
7 hoursJupyter is an open-source, web-based interactive IDE and computing environment. This instructor-led, live training introduces the idea of collaborative development in data science and demonstrates how to use Jupyter to track and participate as a
MATLAB Fundamentals, Data Science & Report Generation
35 hoursIn the first part of this training, we cover the fundamentals of MATLAB and its function as both a language and a platform. Included in this discussion is an introduction to MATLAB syntax, arrays and matrices, data visualization, script
Python Programming for Finance
35 hoursPython is a programming language that has gained huge popularity in the financial industry. Adopted by the largest investment banks and hedge funds, it is being used to build a wide range of financial applications ranging from core trading programs
F# for Data Science
21 hoursData science is the application of statistical analysis, machine learning, data visualization and programming for the purpose of understanding and interpreting real-world data. F# is a well suited programming language for data science as it combines
Introduction to Graph Computing
28 hoursMany real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Kaggle
14 hoursKaggle is a crowd-sourced platform for data scientists. It provides a platform for users to find and publish high-quality datasets, explore and build models in a web-based data-science environment, and work with other data scientists and machine
Accelerating Python Pandas Workflows with Modin
14 hoursModin is a parallel data frame system designed to speed up Pandas workflows. It can be used to handle large datasets, leveraging Ray or Dask as the backend framework for distributed computing in Python. This instructor-led, live training (online
GPU Data Science with NVIDIA RAPIDS
14 hoursRAPIDS is a suite of open source software libraries built to accelerate GPU-driven data science and analytics pipelines. It is based on Python and includes a DataFrame API that integrates with a variety of machine learning algorithms. This
Python and Spark for Big Data (PySpark)
21 hoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this