Course Outline
Introduction
Installing and Configuring Dataiku Data Science Studio (DSS)
- System requirements for Dataiku DSS
- Setting Up Apache Hadoop and Apache Spark integrations
- Configuring Dataiku DSS with web proxies
- Migrating from other platforms to Dataiku DSS
Overview of Dataiku DSS Features and Architecture
- Core objects and graphs foundational to Dataiku DSS
- What is a recipe in Dataiku DSS?
- Types of datasets supported by Dataiku DSS
Creating a Dataiku DSS Project
Defining Datasets to Connect to Data Resources in Dataiku DSS
- Working with DSS connectors and file formats
- Standard DSS formats v.s. Hadoop-specific formats
- Uploading Files for a Dataiku DSS Project
Overview of the Server Filesystem in Dataiku DSS
Creating and Using Managed Folders
- Dataiku DSS recipe for merge folder
- Local v.s. non-local managed folders
Constructing a Filesystem Dataset Using Managed Folder Contents
- Performing cleanups with a DSS code recipe
Working with Metrics Dataset and Internal Stats Dataset
Implementing the DSS Download Recipe for HTTP Dataset
Relocating SQL Datasets and HDFS Datasets Using DSS
Ordering Datasets in Dataiku DSS
- Writer ordering v.s. read-time ordering
Exploring and Preparing Data Visuals for a Dataiku DSS Project
Overview of Dataiku Schemas, Storage Types, and Meanings
Performing Data Cleansing, Normalization, and Enrichment Scripts in Dataiku DSS
Working with Dataiku DSS Charts Interface and Types of Visual Aggregations
Utilizing the Interactive Statistics Feature of DSS
- Univariate analysis v.s. bivariate analysis
- Making use of the Principal Component Analysis (PCA) DSS tool
Overview of Machine Learning with Dataiku DSS
- Supervised ML v.s. unsupervised ML
- References for DSS ML Algorithms and features handling
- Deep Learning with Dataiku DSS
Overview of the Flow Derived from DSS Datasets and Recipes
Transforming Existing Datasets in DSS with Visual Recipes
Utilizing DSS Recipes Based on User-Defined Code
Optimizing Code Exploration and Experimentation with DSS Code Notebooks
Writing Advanced DSS Visualizations and Custom Frontend Features with Webapps
Working with Dataiku DSS Code Reports Feature
Sharing Data Project Elements and Familiarizing with the DSS Dashboard
Designing and Packaging a Dataiku DSS Project as a Reusable Application
Overview of Advanced Methods in Dataiku DSS
- Implementing optimized datasets partitioning using DSS
- Executing specific DSS processing parts through computations in Kubernetes containers
Overview of Collaboration and Version Control in Dataiku DSS
Implementing Automation Scenarios, Metrics, and Checks for DSS Project Testing
Deploying and Updating a Project with the DSS Automation Node and Bundles
Working with Real-Time APIs in Dataiku DSS
- Additional APIs and Rest APIs in DSS
Analyzing and Forecasting Dataiku DSS Time Series
Securing a Project in Dataiku DSS
- Managing Project Permissions and Dashboard Authorizations
- Implementing Advanced Security Options
Integrating Dataiku DSS with The Cloud
Troubleshooting
Summary and Conclusion
Requirements
- Experience with Python, SQL, and R programming languages
- Basic knowledge of data processing with Apache Hadoop and Spark
- Comprehension of machine learning concepts and data models
- Background in statistical analyses and data science concepts
- Experience with visualizing and communicating data
Audience
- Engineers
- Data Scientists
- Data Analysts
Testimonials
It was very interactive and more relaxed and informal than expected. We covered lots of topics in the time and the trainer was always receptive to talking more in detail or more generally about the topics and how they were related. I feel the training has given me the tools to continue learning as opposed to it being a one off session where learning stops once you've finished which is very important given the scale and complexity of the topic.
Jonathan Blease
The trainer was so knowledgeable and included areas I was interested in.
Mohamed Salama
The trainer very easily explained difficult and advanced topics.
Leszek K
All like it
蒙 李
Communication with lecturers
文欣 张
like it all
lisa xie
I genuinely liked excercises
- L M ERICSSON LIMITED
I liked the lab exercises.
Marcell Lorant - L M ERICSSON LIMITED
The Jupyter notebook form, in which the training material is available
- L M ERICSSON LIMITED
There were many exercises and interesting topics.
- L M ERICSSON LIMITED
Some great lab exercises analyzed and explained by the trainer in depth (e.g. covariants in linear regression, matching the real function)
- L M ERICSSON LIMITED
It's just great that all material including the exercises is on the same page and then it gets updated on the fly. The solution is revealed at the end. Cool! Also, I do appreciate that Krzysztof took extra effort to understand our problems and suggested us possible techniques.
Attila Nagy - L M ERICSSON LIMITED
It is showing many methods with pre prepared scripts- very nicely prepared materials & easy to traceback
Kamila Begej - GE Medical Systems Polska Sp. Zoo
I like that training was focused on examples and coding. I thought that it is impossible to pack so much content into three days of training, but I was wrong. Training covered many topics and everything was done in a very detailed manner (especially tuning of model's parameters - I didn't expected that there will be a time for this and I was gratly surprised).
Bartosz Rosiek - GE Medical Systems Polska Sp. Zoo
Issues discussed, exercises carried out (examples), atmosphere of training, contact with the trainer, location.
- Wojskowe Zakłady Uzbrojenia S.A. w Grudziądzu
I like that it focuses more on the how-to of the different text summarization methods
The trainer was a professional in the subject field and related theory with application excellently
Fahad Malalla - Tatweer Petroleum
Ewa has a passion for the subject and a huge wealth of knowledge. She impressed all of us with her knowledge and kept us all focused through the day.
Rock Solid Knowledge Ltd
Even with having to miss a day due to customer meetings, I feel I have a much clearer understanding of the processes and techniques used in Machine Learning and when I would use one approach over another. Our challenge now is to practice what we have learned and start to apply it to our problem domain
Richard Blewett - Rock Solid Knowledge Ltd
So much breadth and topics covered. I felt it was a huge subject to try and cover in 3 days - the trainer did what they could to cover everything almost exactly on time!
Rock Solid Knowledge Ltd
Adjusting to our needs
Sumitomo Mitsui Finance and Leasing Company, Limited
convolution filter
Francesco Ferrara - Inpeco SpA
The enthusiasm to the topic. The examples he made an he explained it very well. Sympatic. A little to detailed for beginners. For managers it could be more abstract in fewer days. But it was designed to fit and we had a good alignment in advance.
Benedikt Chiandetti - HDI Deutschland Bancassurance Kundenservice GmbH
I like that it focuses more on the how-to of the different text summarization methods