Course Outline
Introduction
Understanding Big Data
Overview of Spark
Overview of Python
Overview of PySpark
- Distributing Data Using Resilient Distributed Datasets Framework
- Distributing Computation Using Spark API Operators
Setting Up Python with Spark
Setting Up PySpark
Using Amazon Web Services (AWS) EC2 Instances for Spark
Setting Up Databricks
Setting Up the AWS EMR Cluster
Learning the Basics of Python Programming
- Getting Started with Python
- Using the Jupyter Notebook
- Using Variables and Simple Data Types
- Working with Lists
- Using if Statements
- Using User Inputs
- Working with while Loops
- Implementing Functions
- Working with Classes
- Working with Files and Exceptions
- Working with Projects, Data, and APIs
Learning the Basics of Spark DataFrame
- Getting Started with Spark DataFrames
- Implementing Basic Operations with Spark
- Using Groupby and Aggregate Operations
- Working with Timestamps and Dates
Working on a Spark DataFrame Project Exercise
Understanding Machine Learning with MLlib
Working with MLlib, Spark, and Python for Machine Learning
Understanding Regressions
- Learning Linear Regression Theory
- Implementing a Regression Evaluation Code
- Working on a Sample Linear Regression Exercise
- Learning Logistic Regression Theory
- Implementing a Logistic Regression Code
- Working on a Sample Logistic Regression Exercise
Understanding Random Forests and Decision Trees
- Learning Tree Methods Theory
- Implementing Decision Trees and Random Forest Codes
- Working on a Sample Random Forest Classification Exercise
Working with K-means Clustering
- Understanding K-means Clustering Theory
- Implementing a K-means Clustering Code
- Working on a Sample Clustering Exercise
Working with Recommender Systems
Implementing Natural Language Processing
- Understanding Natural Language Processing (NLP)
- Overview of NLP Tools
- Working on a Sample NLP Exercise
Streaming with Spark on Python
- Overview Streaming with Spark
- Sample Spark Streaming Exercise
Closing Remarks
Requirements
- General programming skills
Audience
- Developers
- IT Professionals
- Data Scientists
Testimonials
practice tasks
Pawel Kozikowski - GE Medical Systems Polska Sp. Zoo
* Organization * Trainer's expertise with the subject
- ENGIE- 101 Arch Street
The teacher has adapted the training program to our current needs.
EduBroker Sp. z o.o.
The lessons were taught in a Jupyter notebook. The topics were structured with a logical sequence and naturally helped develop the session from the easier parts to the more complex. I'm already an advanced user of Python with background in Machine Learning, so found the course easier to follow than, possibly, some of my classmates that took the training course. I appreciate that some of the most elementary concepts were skipped and that he focused on the most substantial matters.
Angela DeLaMora - ADT, LLC
NA
DBS
hands on Training..
Abraham Thomas - PPL
individual attention.
ARCHANA ANILKUMAR - PPL
Related Courses
Scaling Data Analysis with Python and Dask
14 hoursDask is a flexible and high-performance Python library for parallel computing. It scales and accelerates big data processing with other Python-based data science libraries, such as Pandas, Numpy, and Scikit-Learn. This instructor-led, live
Data Analysis with Python, Pandas, and Numpy
14 hoursPandas is a Python package that provides data structures for working with structured (tabular, multidimensional, potentially heterogeneous) and time series data.
Accelerating Python Pandas Workflows with Modin
14 hoursModin is a parallel data frame system designed to speed up Pandas workflows. It can be used to handle large datasets, leveraging Ray or Dask as the backend framework for distributed computing in Python. This instructor-led, live training (online
Machine Learning with Python and Pandas
14 hoursPandas is a Python library for data manipulation and analysis. Using Pandas, users can perform predictive analysis through machine learning. This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use Pandas
FARM (FastAPI, React, and MongoDB) Full Stack Development
14 hoursFARM (FastAPI, React, and MongoDB) is similar to MERN, but performs faster with Python and FastAPI replacing Node.js and Express as the backend. FastAPI is a high-performance Python web framework used by top companies, such as Microsoft, Uber, and
Developing APIs with Python and FastAPI
14 hoursFastAPI is an open source, high-performance web framework for building APIs with Python. It is used by many large companies, such as Uber, Netflix, and Microsoft. This instructor-led, live training (online or onsite) is aimed at developers who
Web application development with Flask
14 hoursThis practical course is addressed to Python developers that want to create and maintain their first web applications. It is also addressed to people who are already familiar with other web frameworks such as Django or Web2py, and want to learn
Advanced Flask
14 hoursFlask is a micro-framework for developing web applications in Python. Unlike other frameworks, Flask does not have any dependencies on external libraries, making it lightweight and fast. This instructor-led, live training (online or onsite) is
Build REST APIs with Python and Flask
14 hoursFlask is a micro-framework for developing web services in Python. Flask, unlike other frameworks, does not have any dependencies on external libraries, making it lightweight and fast. This instructor-led, live training (online or onsite) is aimed
Kivy: Building Android Apps with Python
7 hoursKivy is an open-source cross-platform graphical user interface library written in Python, which allows multi-touch application development for a wide selection of devices. In this instructor-led, live training participants will learn how to
Game Development with PyGame
7 hoursPyGame is an open source library of Python modules for developing game applications and programs. It is lightweight, easy to use, and compatible with any operating system or platform. This instructor-led, live training (online or onsite) is aimed
GUI Programming with Python and PyQt
21 hoursPyQt is a cross-platform library for developing GUIs (graphical user interfaces) for Python applications. It interfaces Python with the Qt GUI toolkit. This instructor-led, live training (online or onsite) is aimed at persons who wish to program
Scientific Computing with Python SciPy
7 hoursSciPy is an open source Python library for scientific, mathematical, and technical computing. It is built on the NumPy extension, providing a wide range of functionalities for performing complex numerical operations. This instructor-led, live
GUI Programming with Python and Tkinter
14 hoursTkinter is the most commonly used Python GUI (Graphical User Interface) programming toolkit and it is the standard GUI package for Python. Tkinter is an object-oriented layer wrapped over the TK GUI toolkit. This instructor-led, live
Web Development with Web2Py
28 hoursWeb2py is a python based free open source full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applications. Audience This course is directed at Engineers and Developers using web2py as a