Course Outline
Introduction
Scala Programming in Depth Review
- Syntax and structure
- Flow control and functions
Spark Internals
- Resilient Distributed Datasets (RDD)
- Spark script to graph to cluster
Overview of Spark Streaming
- Streaming architecture
- Intervals in streaming
- Fault tolerance
Preparing the Development Environment
- Installing and configuring Apache Spark
- Installing and configuring the Scala IDE
- Installing and configuring JDK
Spark Streaming Beginner to Advanced
- Working with key/value RDD's
- Filtering RDD's
- Improving Spark scripts with regular expressions
- Sharing data on a cluster
- Working with network data sets
- Implementing BFS algorithms
- Creating Spark driver scripts
- Tracking in real time with scripts
- Writing continuous applications
- Streaming linear regression
- Using Spark Machine Learning Library
Spark and Clusters
- Bundling dependencies and Spark scripts using the SBT tool
- Using EMR for illustrating clusters
- Optimizing by partitioning RDD's
- Using Spark logs
Integration in Spark Streaming
- Integrating Apache Kafka and working with Kafka topics
- Integrating Apache Fume and working with pull-based/push-based Flume configurations
- Writing a custom receiver class
- Integrating Cassandra and exposing data as real-time services
In Production
- Packaging an application and running it with Spark-Submit
- Troubleshooting, tuning, and debugging Spark Jobs and clusters
Summary and Conclusion
Requirements
- Programming and scripting experience
Audience
- Software Engineers
Testimonials
Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.
Kieran Mac Kenna
share concept diagram and also sample for hands dirty
Mark Yang - FMR
Applicable scenarios and cases
zhaopeng liu - Fmr
case analysis
国栋 张
all parts of this session
Eric Han - Fmr
We know a lot more about the whole environment.
John Kidd
The trainer made the class interesting and entertaining which helps quite a bit with all day training.
Ryan Speelman
I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.
Ernesto did a great job explaining the high level concepts of using Spark and its various modules.
Michael Nemerouf
Richard was very willing to digress when we wanted to ask semi-related questions about things not on the syllabus. Explanations were clear and he was up front about caveats in any advice he gave us.
- ARM Limited
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Small group (4 trainees) and we could progress together. Also the trainer could so help everybody.
- ICE International Copyright Enterprise Germany GmbH
Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.
Biniam Guulay - ICE International Copyright Enterprise Germany GmbH
The lab exercises. Applying the theory from the first day in subsequent days.
- Dell
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.
Thomas Carcaud - IT Frankfurt GmbH
get to learn spark streaming , databricks and aws redshift
Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.
The content and the knowledge .
Jobstreet.com Shared Services Sdn. Bhd.
It was very informative. I've had very little experience with Spark before and so far this course has provided a very good introduction to the subject.
Intelligent Medical Objects
It was great to get an understanding of what is going on under the hood of Spark. Knowing what's going on under the hood helps to better understand why your code is or is not doing what you expect it to do. A lot of the training was hands on which is always great and the section on optimizations was exceptionally relevant to my current work which was nice.
Intelligent Medical Objects
This is a great class! I most appreciate that Andras explains very clearly what Spark is all about, where it came from, and what problems it is able to solve. Much better than other introductions I've seen that just dive into how to use it. Andras has a deep knowledge of the topic and explains things very well.
Intelligent Medical Objects
The live examples that were given and showed the basic aspects of Spark.
Intelligent Medical Objects
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Trainer adjusted the training slightly based on audience request , so throw some light on few diff topics that we have requested
Intelligent Medical Objects
His pace, was great. I loved the fact he went into theory too so that I understand WHY i would do the things he is asking.
Intelligent Medical Objects
I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.
The trainer was passionate and well-known what he said I appreciate his help and answers all our questions and suggested cases.
Related Courses
Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
21 hoursThis course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and
Apache Spark MLlib
35 hoursMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative
Akka - from Beginner to Intermediate
21 hoursThis training outline is intended to bring attendees from a beginner to an intermediate/advanced level in the understanding and knowledge of the Akka framework. The entire course is hands on, mostly driven by the trainer in the beginning and
Alluxio: Unifying Disparate Storage Systems
7 hoursAlluxio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba. In this instructor-led,
Introduction to Graph Computing
28 hoursMany real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set
Hortonworks Data Platform (HDP) for Administrators
21 hoursHortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces
Magellan: Geospatial Analytics on Spark
14 hoursMagellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics. This instructor-led, live
Machine Learning Fundamentals with Scala and Apache Spark
14 hoursThe aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the Scala programming language and its various libraries, and based on a multitude of practical examples this course
Scala: Advanced Object-Functional Programming
14 hoursScala is a concise, object-oriented language with functional programming features, including currying, type inference, immutability, lazy evaluation, and pattern matching. Scala code runs on a JVM and was designed to address some of the shortcomings
Scala: Advanced Functional Programming
14 hoursScala is a concise, object-oriented language with functional programming features, including currying, type inference, immutability, lazy evaluation, and pattern matching. In this instructor-led, live training participants will learn how to use
Property Based Testing with ScalaCheck
21 hoursScalaCheck is a library for carrying out automated, property-based testing for Scala or Java programs. Inspired by the Haskell library QuickCheck, it uses properties to describe the expected behavior of an application, generating random input data
Programming in Scala
14 hoursThe training aims to provide opportunities Scala language, learning the syntax of programming paradigms, and space applications.
Spark for Developers
21 hoursOBJECTIVE: This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark
Apache Spark SQL
7 hoursSpark SQL is Apache Spark's module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform
Python and Spark for Big Data (PySpark)
21 hoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this