Artificial Intelligence Training Courses

Artificial Intelligence Training

AI, Synthetic Intelligence training.
NobleProg specializes in any kind of AI, from Machine Learning, Big Data, Rule Engines (Reasoner), Automatic Process Optimization to Meta-heuristics.

Testi...Client Testimonials

Business Rule Management (BRMS) with Drools Training Course

I appreciate the effort made by NobleProg and the trainer in particular to hold this course, Bernard not only described the features of the product, he also helped me understand how it fits with my project.

 

Fernando Orus - InSynergy Consulting SA

Managing Business Logic with Drools

A very good overview of Drools with some deep dives in the code and practicals.

Patrick Phelan - Sun Life Financial

Introduction to Drools 6 for Developers

Lots of exercises, which were good and which were well-administered.

Joseph Richardson - Sandia National Labs

Artificial Neural Networks, Machine Learning, Deep Thinking

It was very interactive and more relaxed and informal than expected. We covered lots of topics in the time and the trainer was always receptive to talking more in detail or more generally about the topics and how they were related. I feel the training has given me the tools to continue learning as opposed to it being a one off session where learning stops once you've finished which is very important given the scale and complexity of the topic.

Jonathan Blease - Knowledgepool Group Ltd

Introduction to the use of neural networks

Ann created a great environment to ask questions and learn. We had a lot of fun and also learned a lot at the same time.

Gudrun Bickelq - Tricentis GmbH

Introduction to the use of neural networks

Ann created a great environment to ask questions and learn. We had a lot of fun and also learned a lot at the same time.

Gudrun Bickelq - Tricentis GmbH

Introduction to the use of neural networks

the interactive part, tailored to our specific needs

Thomas Stocker - Tricentis GmbH

Natural Language Processing with Python

I did like the exercises

- Office for National Statistics

Solr for Developers

He is provided great example for each topic

Onoriode Ikede - Government of Prince Edward Island

Solr for Developers

He is provided great example for each topic

Onoriode Ikede - Government of Prince Edward Island

Computer Vision with OpenCV

The hands-on approach

Kevin De Cuyper - Automatic Systems

Solr for Developers

The trainer has provided great example for each topic

Onoriode Ikede - Government of Prince Edward Island

Applied Machine Learning

ref material to use later was very good

PAUL BEALES - Seagate Technology

Business Rule Management (BRMS) with Drools

good atmosphere

Martin Jesterschawek - OSRAM Opto Semiconductors GmbH

Data Mining with R

very tailored to needs

Yashan Wang - MoneyGram International

Introduction to Deep Learning

The topic is very interesting

Wojciech Baranowski - Dolby Poland Sp. z o.o.

Introduction to Deep Learning

Trainers theoretical knowledge and willingness to solve the problems with the participants after the training

Grzegorz Mianowski - Dolby Poland Sp. z o.o.

Introduction to Deep Learning

Topic. Very interesting!

Piotr - Dolby Poland Sp. z o.o.

Introduction to Deep Learning

Exercises after each topic were really helpful, despite there were too complicated at the end. In general, the presented material was very interesting and involving! Exercises with image recognition were great.

- Dolby Poland Sp. z o.o.

Advanced Deep Learning

The global overview of deep learning

Bruno Charbonnier - OSONES

Advanced Deep Learning

The exercises are sufficiently practical and do not need a high knowledge in Python to be done.

Alexandre GIRARD - OSONES

Advanced Deep Learning

Doing exercises on real examples using Keras. Mihaly totally understood our expectations about this training.

Paul Kassis - OSONES

Introduction to Deep Learning

Interesting subject

Wojciech Wilk - Dolby Poland Sp. z o.o.

Spark for Developers

Richard is very calm and methodical, with an analytical insight - exactly the qualities needed to present this sort of course

Kieran Mac Kenna - BAE Systems Applied Intelligence

Introduction to Drools 6 for Developers

Interactive approach, keeps the training interesting.

Elaine McCarthy - Sun Life Financial

Introduction to Drools 6 for Developers

very well delivered

Damien Reid - Sun Life Financial

Introduction to Drools 6 for Developers

Interactive trainer, helpful and had lots of suggestions for participants.

Liam Donovan - Sun Life Financial

Introduction to Drools 6 for Developers

Interactive trainer, helpful and had lots of suggestions for participants.

Liam Donovan - Sun Life Financial

Introduction to Drools 6 for Developers

Nice to see some other editors, other details around bpmn

Derek Doherty - Sun Life Financial

Introduction to Drools 6 for Developers

Exercises in Eclipse

Anna Beluskova - Sun Life Financial

Introduction to Drools 6 for Developers

The exercises were great and the material is short and concise.

Anjali Sharma - Sun Life Financial

Introduction to Drools 6 for Developers

it met our expectations

Vadim Bilan - Sun Life Financial

Introduction to Drools 6 for Developers

Maintaining speed with taking every one in the group along. Exercise oriented. Tried to cover as much as possible comfirtabliy.

Rakesh Prajapati - Sun Life Financial

Introduction to Drools 6 for Developers

Flexibility and throrough explanations regarding the usage

Denis Kirchhübel - Eldor Technology AS

Introduction to Drools 6 for Developers

Positive and optimistic attitude. Gives good answers to questions.

Emil Krabbe Nielsen - Eldor Technology AS

Data Mining and Analysis

I like the exercices done

Nour Assaf - Murex Services S.A.L (Offshore)

Data Mining and Analysis

The hands on exercise and the trainer capacity to explain complex topics in simple terms

youssef chamoun - Murex Services S.A.L (Offshore)

Data Mining and Analysis

The information given was interesting and the best part was towards the end when we were provided with Data from Murex and worked on Data we are familiar with and perform operations to get results.

Jessica Chaar - Murex Services S.A.L (Offshore)

Neural Networks Fundamentals using TensorFlow as Example

Knowledgeable trainer

Sridhar Voorakkara - INTEL R&D IRELAND LIMITED

Neural Networks Fundamentals using TensorFlow as Example

I was amazed at the standard of this class - I would say that it was university standard.

David Relihan - INTEL R&D IRELAND LIMITED

Neural Networks Fundamentals using TensorFlow as Example

Very good all round overview.Good background into why Tensorflow operates as it does.

Kieran Conboy - INTEL R&D IRELAND LIMITED

Neural Networks Fundamentals using TensorFlow as Example

I liked the opportunities to ask questions and get more in depth explanations of the theory.

Sharon Ruane - INTEL R&D IRELAND LIMITED

Administrator Training for Apache Hadoop

Trainer give reallive Examples

Simon Hahn - OPITZ CONSULTING Deutschland GmbH

Administrator Training for Apache Hadoop

Big competences of Trainer

Grzegorz Gorski - OPITZ CONSULTING Deutschland GmbH

Administrator Training for Apache Hadoop

Many hands-on sessions.

Jacek Pieczątka - OPITZ CONSULTING Deutschland GmbH

Neural Network in R

new insights in deep machine learning

Josip Arneric - Faculty of Economics and Business Zagreb

Neural Network in R

We gained some knowledge about NN in general, and what was the most interesting for me were the new types of NN that are popular nowadays.

Tea Poklepovic - Faculty of Economics and Business Zagreb

Neural Network in R

Graphs in R :)))

- Faculty of Economics and Business Zagreb

Data Visualization

I thought that the information was interesting.

Allison May - Virginia Department of Education

Data Visualization

I really appreciated that Jeff utilized data and examples that were applicable to education data. He made it interesting and interactive.

Carol Wells Bazzichi - Virginia Department of Education

Data Visualization

Learning about all the chart types and what they are used for. Learning the value of decluttering. Learning about the methods to show time data.

Susan Williams - Virginia Department of Education

Data Visualization

Trainer was enthusiastic.

Diane Lucas - Virginia Department of Education

Data Visualization

Content / Instructor

Craig Roberson - Virginia Department of Education

Data Visualization

I am a hands-on learner and this was something that he did a lot of.

Lisa Comfort - Virginia Department of Education

Introduction to Deep Learning

The deep knowledge of the trainer about the topic.

Sebastian Görg - FANUC Europe Corporation

Data Visualization

The examples.

peter coleman - Virginia Department of Education

Data Visualization

The examples.

peter coleman - Virginia Department of Education

Data Visualization

Good real world examples, reviews of existing reports

Ronald Parrish - Virginia Department of Education

Cassandra for Developers

Topics approached. Very complete.

Carlos Eloi Barros - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

The last exercise was very good.

José Monteiro - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

I already using and have a application in production with cassandra so mostly of the topics i already know but the data modeling and advanced topics are a lot interesting.

Tiago Costa - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

There was a lot of knowledge and material shared that will help me to do my current tasks.

Miguel Fernandes - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

The amount of exercises. We could immediately apply the knowledge shared and ensure the information was on point.

Joana Pereira - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

All technical explanation and theoretical introduction

André Santos - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

Very good explanations with in depth examples

Rui Magalhaes - Farfetch Portugal - Unipessoal, Lda

Cassandra for Developers

The practical exercises and examples of implementing examples of real models and contexts

Leandro Gomes - Farfetch Portugal - Unipessoal, Lda

A practical introduction to Data Analysis and Big Data

Willingness to share more

Balaram Chandra Paul - MOL Information Technology Asia Limited

Spark for Developers

We know know a lot more about the whole environment

John Kidd - Cardano Risk Management

Spark for Developers

The trainer made the class interesting and entertaining which helps quite a bit with all day trainings

Ryan Speelman -

Spark for Developers

I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.

Spark for Developers

I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.

Spark for Developers

Ernesto did a great job explaining the high level concepts of using Spark and it's various modules.

Michael Nemerouf -

IoT (Internet of Things) for Entrepreneurs, Managers and Investors

Some new and interesting ideas. Meeting and interacting with other attendees

TECTERRA

Neural Networks Fundamentals using TensorFlow as Example

Given outlook of the technology: what technology/process might become more important in the future; see, what the technology can be used for

Commerzbank AG

Cassandra Administration

The 1:1 style meant the training was tailored to my individual needs.

Andy McGuigan - Axon Public Safety UK Limited

Neural Networks Fundamentals using TensorFlow as Example

Topic selection. Style of training. Practice orientation

Commerzbank AG

Neural Networks Fundamentals using TensorFlow as Example

Topic selection. Style of training. Practice orientation

Commerzbank AG

A practical introduction to Data Analysis and Big Data

It covered a broad range of information.

Continental AG / Abteilung: CF IT Finance

A practical introduction to Data Analysis and Big Data

presentation of technologies

Continental AG / Abteilung: CF IT Finance

A practical introduction to Data Analysis and Big Data

Overall the Content was good.

Sameer Rohadia - Continental AG / Abteilung: CF IT Finance

Beyond the relational database: neo4j

Flexibility to blend in with Autodata related details to get more of a real world scenario as we went on.

Autodata Ltd

Beyond the relational database: neo4j

Flexibility to blend in with Autodata related details to get more of a real world scenario as we went on.

Autodata Ltd

Beyond the relational database: neo4j

The trainer did bring some good insight and ways to approach developing a graph database. He used examples from the slides presented but also drew on his own experience which was good.

Autodata Ltd

Beyond the relational database: neo4j

The trainer did bring some good insight and ways to approach developing a graph database. He used examples from the slides presented but also drew on his own experience which was good.

Autodata Ltd

Beyond the relational database: neo4j

The trainer did bring some good insight and ways to approach developing a graph database. He used examples from the slides presented but also drew on his own experience which was good.

Autodata Ltd

Subcategories

Artificial Intelligence Course Outlines

Code Name Duration Overview
wfsadm WildFly Server Administration 14 hours This course is created for Administrators, Developers or anyone who is interested in managing WildFly Application Server (AKA JBoss Application Server). This course usually runs on the newest version of the Application Server, but it can be tailored (as a private course) to older versions starting from version 5.1. Module 1: Installing Core Components Installing the Java environment  Installing JBoss AS Application server features Creating a custom server configuration Module 2: Customizing JBoss AS Services How to monitor JBoss AS services JBoss AS thread pool Configuring logging services Configuring the connection to the database Configuring the transaction service Module 3. Deploying EJB 3 Session Beans Developing Enterprise JavaBeans Configuring the EJB container Module 4: Deploying a Web Application Developing web layout Configuring JBoss Web Server Module 5: Deploying Applications with JBoss Messaging Service The new JBoss Messaging system Developing JMS applications Advanced JBoss Messaging Module 6: Managing JBoss AS Introducing Java Management Extension JBoss AS Administration Console Managing applications Administering resources
MLFWR1 Machine Learning Fundamentals with R 14 hours The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means
dladv Advanced Deep Learning 28 hours Machine Learning Limitations Machine Learning, Non-linear mappings Neural Networks Non-Linear Optimization, Stochastic/MiniBatch Gradient Decent Back Propagation Deep Sparse Coding Sparse Autoencoders (SAE) Convolutional Neural Networks (CNNs) Successes: Descriptor Matching Stereo-based Obstacle Avoidance for Robotics Pooling and invariance Visualization/Deconvolutional Networks Recurrent Neural Networks (RNNs) and their optimizaiton Applications to NLP RNNs continued, Hessian-Free Optimization Language analysis: word/sentence vectors, parsing, sentiment analysis, etc. Probabilistic Graphical Models Hopfield Nets, Boltzmann machines, Restricted Boltzmann Machines Hopfield Networks, (Restricted) Bolzmann Machines Deep Belief Nets, Stacked RBMs Applications to NLP , Pose and Activity Recognition in Videos Recent Advances Large-Scale Learning Neural Turing Machines  
facebooknmt Facebook NMT: Setting up a neural machine translation system 7 hours Fairseq is an open-source sequence-to-sequence learning toolkit created by Facebok for use in Neural Machine Translation (NMT). In this training participants will learn how to use Fairseq to carry out translation of sample content. By the end of this training, participants will have the knowledge and practice needed to implement a live Fairseq based machine translation solution. Audience Localization specialists with a technical background Global content managers Localization engineers Software developers in charge of implementing global content solutions Format of the course Part lecture, part discussion, heavy hands-on practice Note If you wish to use specific source and target language content, please contact us to arrange. Introduction     Why Neural Machine Translation?     Borrowing from image recognition techniques Overview of the Torch and Caffe2 projects Overview of a Convolutional Neural Machine Translation model     Convolutional Sequence to Sequence Learning     Convolutional Encoder Model for Neural Machine Translation     Standard LSTM-based model Overview of training approaches     About GPUs and CPUs     Fast beam search generation Installation and setup Evaluating pre-trained models Preprocessing your data Training the model Translating Converting a trained model to use CPU-only operations Joining to the community Closing remarks
zeppelin Zeppelin for interactive data analytics 14 hours Apache Zeppelin is a web-based notebook for capturing, exploring, visualizing and sharing Hadoop and Spark based data. This instructor-led, live training introduces the concepts behind interactive data analytics and walks participants through the deployment and usage of Zeppelin in a single-user or multi-user environment. By the end of this training, participants will be able to: Install and configure Zeppelin Develop, organize, execute and share data in a browser-based interface Visualize results without referring to the command line or cluster details Execute and collaborate on long workflows Work with any of a number of plug-in language/data-processing-backends, such as Scala ( with Apache Spark ), Python ( with Apache Spark ), Spark SQL, JDBC, Markdown and Shell. Integrate Zeppelin with Spark, Flink and Map Reduce Secure multi-user instances of Zeppelin with Apache Shiro Audience Data engineers Data analysts Data scientists Software developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
dlv Deep Learning for Vision 21 hours Audience This course is suitable for Deep Learning researchers and engineers interested in utilizing available tools (mostly open source ) for analyzing computer images This course provide working examples. Deep Learning vs Machine Learning vs Other Methods When Deep Learning is suitable Limits of Deep Learning Comparing accuracy and cost of different methods Methods Overview Nets and  Layers Forward / Backward: the essential computations of layered compositional models. Loss: the task to be learned is defined by the loss. Solver: the solver coordinates model optimization. Layer Catalogue: the layer is the fundamental unit of modeling and computation Convolution​ Methods and models Backprop, modular models Logsum module RBF Net MAP/MLE loss Parameter Space Transforms Convolutional Module Gradient-Based Learning  Energy for inference, Objective for learning PCA; NLL:  Latent Variable Models Probabilistic LVM Loss Function Detection with Fast R-CNN Sequences with LSTMs and Vision + Language with LRCN Pixelwise prediction with FCNs Framework design and future Tools Caffe Tensorflow R Matlab Others...
odmblockchain IBM ODM and Blockchain: Applying business rules to Smart Contracts 14 hours Smart Contracts are used to encode and encapsulate the rules for automatically initiating and processing transactions on the Blockchain. In this instructor-led, live training, participants will learn how to use IBM Operational Decision Manager (ODM) with Hyperledger Composer to implement the business logic of a Smart Contract using business rules. By the end of this training, participants will be able to: Use ODM's rule engine together with Blockchain to "unbury" rules from the codebase of a Blockchain application Set up a system to allow specialist such as accountants, auditors, lawyers, and analysts to define the rules of exchange for themselves Use Decision Center as a platform to collaboratively govern rules Use ODM's rule engine to update, test and deploy rules without touching the code of the Smart Contract Deploy the IBM ODM Rule Execution Server Integrate IBM ODM with Hyperledger Composer running on Hyperledger Fabric Audience Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
nifi Apache NiFi for Administrators 21 hours Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a web-based user interface to manage dataflows in real time. In this instructor-led, live training, participants will learn how to deploy and manage Apache NiFi in a live lab environment. By the end of this training, participants will be able to: Install and configure Apachi NiFi Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes Automate dataflows Enable streaming analytics Apply various approaches for data ingestion Transform Big Data and into business insights Audience System administrators Data engineers Developers DevOps Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction to Apache NiFi        Data at rest vs data in motion Overview of big data and Apache Hadoop     HDFS and MapReduce architecture Installing and configuring NiFi Cluster integration NiFi FlowFile Processor NiFi Flow Controller Database aggregating, splitting and transforming Troubleshooting Closing remarks
pythontextml Python: Machine Learning with Text 21 hours In this instructor-led, live training, participants will learn how to use the right machine learning and NLP (Natural Language Processing) techniques to extract value from text-based data. By the end of this training, participants will be able to: Solve text-based data science problems with high-quality, reusable code Apply different aspects of scikit-learn (classification, clustering, regression, dimensionality reduction) to solve problems Build effective machine learning models using text-based data Create a dataset and extract features from unstructured text Visualize data with Matplotlib Build and evaluate models to gain insight Troubleshoot text encoding errors Audience Developers Data Scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction     The value of text-based data Workflow for a Text-Based Data Science Problem Choosing the Right Machine Learning Libraries Overview of NLP Techniques Preparing a Dataset Visualizing the Data Working with Text Data with scikit-learn Building a Machine Learning Model Splitting into Train and Test Sets Applying Linear Regression and Non-Linear Regression Applying NLP Techniques Parsing Text Data Using Regular Expressions Exploring Other Machine Language Approaches Troubleshooting Text Encoding Issues Closing Remarks
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance 21 hours This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis. The major focus of the course is data manipulation and transformation. Among the tools in the Hadoop ecosystem this course includes the use of Pig and Hive both of which are heavily used for data transformation and manipulation. This training also addresses performance metrics and performance optimisation. The course is entirely hands on and is punctuated by presentations of the theoretical aspects. 1.1Hadoop Concepts 1.1.1HDFS The Design of HDFS Command line interface Hadoop File System 1.1.2Clusters Anatomy of a cluster Mater Node / Slave node Name Node / Data Node 1.2Data Manipulation 1.2.1MapReduce detailed Map phase Reduce phase Shuffle 1.2.2Analytics with Map Reduce Group-By with MapReduce Frequency distributions and sorting with MapReduce Plotting results (GNU Plot) Histograms with MapReduce Scatter plots with MapReduce Parsing complex datasets Counting with MapReduce and Combiners Build reports   1.2.3Data Cleansing Document Cleaning Fuzzy string search Record linkage / data deduplication Transform and sort event dates Validate source reliability Trim Outliers 1.2.4Extracting and Transforming Data Transforming logs Using Apache Pig to filter Using Apache Pig to sort Using Apache Pig to sessionize 1.2.5Advanced Joins Joining data in the Mapper using MapReduce Joining data using Apache Pig replicated join Joining sorted data using Apache Pig merge join Joining skewed data using Apache Pig skewed join Using a map-side join in Apache Hive Using optimized full outer joins in Apache Hive Joining data using an external key value store 1.3Performance Diagnosis and Optimization Techniques Map Investigating spikes in input data Identifying map-side data skew problems Map task throughput Small files Unsplittable files Reduce Too few or too many reducers Reduce-side data skew problems Reduce tasks throughput Slow shuffle and sort Competing jobs and scheduler throttling Stack dumps & unoptimized code Hardware failures CPU contention Tasks Extracting and visualizing task execution times Profiling your map and reduce tasks Avoid the reducer Filter and project Using the combiner Fast sorting with comparators Collecting skewed data Reduce skew mitigation
nlp Natural Language Processing 21 hours This course has been designed for people interested in extracting meaning from written English text, though the knowledge can be applied to other human languages as well. The course will cover how to make use of text written by humans, such as  blog posts, tweets, etc... For example, an analyst can set up an algorithm which will reach a conclusion automatically based on extensive data source. Short Introduction to NLP methods word and sentence tokenization text classification sentiment analysis spelling correction information extraction parsing meaning extraction question answering Overview of NLP theory probability statistics machine learning n-gram language modeling naive bayes maxent classifiers sequence models (Hidden Markov Models) probabilistic dependency constituent parsing vector-space models of meaning
mdldromgdmn Modelling Decision and Rules with OMG DMN 14 hours This course teaches how to design and execute decisions in rules with OMG DMN (Decision Model and Notation) standard.Introduction to DMN Short history Basic concepts Decision requirements Decision log Scope and uses of DMN (human and automated decision making) Decision Requirements DRG DRD Decision Table Simple Expression Language (S-FEEL) FEEL Overview of Execution Tools available on the market Simple scenarios and workshop for executing the decision tables
bigdatastore Big Data Storage Solution - NoSQL 14 hours When traditional storage technologies don't handle the amount of data you need to store there are hundereds of alternatives. This course try to guide the participants what are alternatives for storing and analyzing Big Data and what are theirs pros and cons. This course is mostly focused on discussion and presentation of solutions, though hands-on exercises are available on demand. Limits of Traditional Technologies SQL databases Redundancy: replicas and clusters Constraints Speed Overview of database types Object Databases Document Store Cloud Databases Wide Column Store Multidimensional Databases Multivalue Databases Streaming and Time Series Databases Multimodel Databases Graph Databases Key Value XML Databases Distribute file systems Popular NoSQL Databases MongoDB Cassandra Apache Hadoop Apache Spark other solutions NewSQL Overview of available solutions Performance Inconsitencies Document Storage/Search Optimized Solr/Lucene/Elasticsearch other solutions
tpuprogramming TPU Programming: Building Neural Network Applications on Tensor Processing Units 7 hours The Tensor Processing Unit (TPU) is the architecture which Google has used internally for several years, and is just now becoming available for use by the general public. It includes several optimizations specifically for use in neural networks, including streamlined matrix multiplication, and 8-bit integers instead of 16-bit in order to return appropriate levels of precision. In this instructor-led, live training, participants will learn how to take advantage of the innovations in TPU processors to maximize the performance of their own AI applications. By the end of the training, participants will be able to: Train various types of neural networks on large amounts of data Use TPUs to speed up the inference process by up to two orders of magnitude Utilize TPUs to process intensive applications such as image search, cloud vision and photos Audience Developers Researchers Engineers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
samza Samza for stream processing 14 hours Apache Samza is an open-source near-realtime, asynchronous computational framework for stream processing.  It uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management. This instructor-led, live training introduces the principles behind messaging systems and distributed stream processing, while walking participants through the creation of a sample Samza-based project and job execution. By the end of this training, participants will be able to: Use Samza to simplify the code needed to produce and consume messages Decouple the handling of messages from an application Use Samza to implement near-realtime asynchronous computation Use stream processing to provide a higher level of abstraction over messaging systems Audience Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
aiauto Artificial Intelligence in Automotive 14 hours This course covers AI (emphasizing Machine Learning and Deep Learning) in Automotive Industry. It helps to determine which technology can be (potentially) used in multiple situation in a car: from simple automation, image recognition to autonomous decision making. Current state of the technology What is used What may be potentially used Rules based AI  Simplifying decision Machine Learning  Classification Clustering Neural Networks Types of Neural Networks Presentation of working examples and discussion Deep Learning Basic vocabulary  When to use Deep Learning, when not to Estimating computational resources and cost Very short theoretical background to Deep Neural Networks Deep Learning in practice (mainly using TensorFlow) Preparing Data Choosing loss function Choosing appropriate type on neural network Accuracy vs speed and resources Training neural network Measuring efficiency and error Sample usage Anomaly detection Image recognition ADAS        
simplecv Computer Vision with SimpleCV 14 hours SimpleCV is an open source framework — meaning that it is a collection of libraries and software that you can use to develop vision applications. It lets you work with the images or video streams that come from webcams, Kinects, FireWire and IP cameras, or mobile phones. It’s helps you build software to make your various technologies not only see the world, but understand it too. Audience This course is directed at engineers and developers seeking to develop computer vision applications with SimpleCV. Getting Started Installation Tutorials & Examples SimpleCV Shell SimpleCV Basics The Hello World program Interacting with the Display Loading a Directory of Images Macro’s Kinect Timing Detecting a Car Segmenting the Image and Morphology Image Arithmetic Exceptions in Image Math Histograms Color Space Using Hue Peaks Creating a Motion Blur Effect Simulating Long Exposure Chroma Key (Green Screen) Drawing on Images in SimpleCV Layers Marking up the Image Text and Fonts Making a Custom Display Object
nifidev Apache NiFi for Developers 7 hours Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a web-based user interface to manage dataflows in real time. In this instructor-led, live training, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi. By the end of this training, participants will be able to: Understand NiFi's architecture and dataflow concepts Develop extensions using NiFi and third-party APIs Custom develop their own Apache Nifi processor Ingest and process real-time data from disparate and uncommon file formats and data sources Audience Developers Data engineers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction     Data at rest vs data in motion Overview of big data tools and technologies     Hadoop (HDFS and MapReduce) and Spark Installing and configuring NiFi Overview of NiFi architecture Development approaches     Application development tools and mindset     Extract, Transform, and Load (ETL) tools and mindset Design considerations Components, events, and processor patterns Exercise: Streaming data feeds into HDFS Error Handling Controller Services Exercise: Ingesting data from IoT devices using web-based APIs Exercise: Developing a custom Apache Nifi processor using JSON Testing and troubleshooting Contributing to Apache NiFi Closing remarks
voldemort Voldemort: Setting up a key-value distributed data store 14 hours Voldemort is an open-source distributed data store that is designed as a key-value store.  It is used at LinkedIn by numerous critical services powering a large portion of the site. This course will introduce the architecture and capabilities of Voldomort and walk participants through the setup and application of a key-value distributed data store. Audience     Software developers     System administrators     DevOps engineers Format of the course     Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction Understanding distributed key-value storage systems Voldomort data model and architecture Downloading and configuration Command line operations Clients and servers Working with Hadoop Configuring build and push jobs Rebalancing a Voldemort instance Serving Large-scale Batch Computed Data Using the Admin Tool Performance tuning
pythonmultipurpose Advanced Python 28 hours In this instructor-led training, participants will learn advanced Python programming techniques, including how to apply this versatile language to solve problems in areas such as distributed applications, finance, data analysis and visualization, UI programming and maintenance scripting. Audience Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Notes If you wish to add, remove or customize any section or topic within this course, please contact us to arrange. Introduction Python versatility: from data analysis to web crawling Python data structures and operations Integers and floats Strings and bytes Tuples and lists Dictionaries and ordered dictionaries Sets and frozen sets Data frame (pandas) Conversions Object-oriented programming with Python Inheritance Polymorphism Static classes Static functions Decorators Other Data Analysis with pandas Data cleaning Using vectorized data in pandas Data wrangling Sorting and filtering data Aggregate operations Analyzing time series Data visualization Plotting diagrams with matplotlib Using matplotlib from within pandas Creating quality diagrams Visualizing data in Jupyter notebooks Other visualization libraries in Python Vectorizing Data in Numpy Creating Numpy arrays Common operations on matrices Using ufuncs Views and broadcasting on Numpy arrays Optimizing performance by avoiding loops Optimizing performance with cProfile Processing Big Data with Python Building and supporting distributed applications with Python Data storage: Working with SQL and NoSQL databases Distributed processing with Hadoop and Spark Scaling your applications Python for finance Packages, libraries and APIs for financial processing Zipline PyAlgoTrade Pybacktest quantlib Python APIs Extending Python (and vice versa) with other languages C# Java C++ Perl Others Python multi-threaded programming Modules Synchronizing Prioritizing UI programming with Python Framework options for building GUIs in Python Tkinter Pyqt Python for maintenance scripting Raising and catching exceptions correctly Organizing code into modules and packages Understanding symbol tables and accessing them in code Picking a testing framework and applying TDD in Python Python for the web Packages for web processing Web crawling Parsing HTML and XML Filling web forms automatically Closing remarks
sspsspas Statistics with SPSS Predictive Analytics Software 14 hours Goal: Learning to work with SPSS at the level of independence The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques. Using the program The dialog boxes input / downloading data the concept of variable and measuring scales preparing a database Generate tables and graphs formatting of the report Command language syntax automated analysis storage and modification procedures create their own analytical procedures Data Analysis descriptive statistics Key terms: eg variable, hypothesis, statistical significance measures of central tendency measures of dispersion measures of central tendency standardization Introduction to research the relationships between variables correlational and experimental methods Summary: This case study and discussion
annmldt Artificial Neural Networks, Machine Learning, Deep Thinking 21 hours DAY 1 - ARTIFICIAL NEURAL NETWORKS Introduction and ANN Structure. Biological neurons and artificial neurons. Model of an ANN. Activation functions used in ANNs. Typical classes of network architectures . Mathematical Foundations and Learning mechanisms. Re-visiting vector and matrix algebra. State-space concepts. Concepts of optimization. Error-correction learning. Memory-based learning. Hebbian learning. Competitive learning. Single layer perceptrons. Structure and learning of perceptrons. Pattern classifier - introduction and Bayes' classifiers. Perceptron as a pattern classifier. Perceptron convergence. Limitations of a perceptrons. Feedforward ANN. Structures of Multi-layer feedforward networks. Back propagation algorithm. Back propagation - training and convergence. Functional approximation with back propagation. Practical and design issues of back propagation learning. Radial Basis Function Networks. Pattern separability and interpolation. Regularization Theory. Regularization and RBF networks. RBF network design and training. Approximation properties of RBF. Competitive Learning and Self organizing ANN. General clustering procedures. Learning Vector Quantization (LVQ). Competitive learning algorithms and architectures. Self organizing feature maps. Properties of feature maps. Fuzzy Neural Networks. Neuro-fuzzy systems. Background of fuzzy sets and logic. Design of fuzzy stems. Design of fuzzy ANNs. Applications A few examples of Neural Network applications, their advantages and problems will be discussed. DAY -2 MACHINE LEARNING The PAC Learning Framework Guarantees for finite hypothesis set – consistent case Guarantees for finite hypothesis set – inconsistent case Generalities Deterministic cv. Stochastic scenarios Bayes error noise Estimation and approximation errors Model selection Radmeacher Complexity and VC – Dimension Bias - Variance tradeoff Regularisation Over-fitting Validation Support Vector Machines Kriging (Gaussian Process regression) PCA and Kernel PCA Self Organisation Maps (SOM) Kernel induced vector space Mercer Kernels and Kernel - induced similarity metrics Reinforcement Learning DAY 3 - DEEP LEARNING This will be taught in relation to the topics covered on Day 1 and Day 2 Logistic and Softmax Regression Sparse Autoencoders Vectorization, PCA and Whitening Self-Taught Learning Deep Networks Linear Decoders Convolution and Pooling Sparse Coding Independent Component Analysis Canonical Correlation Analysis Demos and Applications
sparkdev Spark for Developers 21 hours OBJECTIVE: This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.  The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX. AUDIENCE : Developers / Data Analysts Scala primer A quick introduction to Scala Labs : Getting know Scala Spark Basics Background and history Spark and Hadoop Spark concepts and architecture Spark eco system (core, spark sql, mlib, streaming) Labs : Installing and running Spark First Look at Spark Running Spark in local mode Spark web UI Spark shell Analyzing dataset – part 1 Inspecting RDDs Labs: Spark shell exploration RDDs RDDs concepts Partitions RDD Operations / transformations RDD types Key-Value pair RDDs MapReduce on RDD Caching and persistence Labs : creating & inspecting RDDs;   Caching RDDs Spark API programming Introduction to Spark API / RDD API Submitting the first program to Spark Debugging / logging Configuration properties Labs : Programming in Spark API, Submitting jobs Spark SQL SQL support in Spark Dataframes Defining tables and importing datasets Querying data frames using SQL Storage formats : JSON / Parquet Labs : Creating and querying data frames; evaluating data formats MLlib MLlib intro MLlib algorithms Labs : Writing MLib applications GraphX GraphX library overview GraphX APIs Labs : Processing graph data using Spark Spark Streaming Streaming overview Evaluating Streaming platforms Streaming operations Sliding window operations Labs : Writing spark streaming applications Spark and Hadoop Hadoop Intro (HDFS / YARN) Hadoop + Spark architecture Running Spark on Hadoop YARN Processing HDFS files using Spark Spark Performance and Tuning Broadcast variables Accumulators Memory management & caching Spark Operations Deploying Spark in production Sample deployment templates Configurations Monitoring Troubleshooting
predio Machine Learning with PredictionIO 21 hours PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack. Audience This course is directed at developers and data scientists who want to create predictive engines for any machine learning task. Getting Started Quick Intro Installation Guide Downloading Template Deploying an Engine Customizing an Engine App Integration Overview Developing PredictionIO System Architecture Event Server Overview Collecting Data Learning DASE Implementing DASE Evaluation Overview Intellij IDEA Guide Scala API Machine Learning Education and Usage​ Examples Comics Recommendation Text Classification Community Contributed Demo Dimensionality Reducation and usage PredictionIO SDKs (Select One) Java PHP Python Ruby Community Contributed  
alluxio Alluxio: Unifying disparate storage systems 7 hours Alexio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba. In this instructor-led, live training, participants will learn how to use Alexio to bridge different computation frameworks with storage systems and efficiently manage multi-petabyte scale data as they step through the creation of an application with Alluxio. By the end of this training, participants will be able to: Develop an application with Alluxio Connect big data systems and applications while preserving one namespace Efficiently extract value from big data in any storage format Improve workload performance Deploy and manage Alluxio standalone or clustered Audience Data scientist Developer System administrator Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
deckgl deck.gl: Visualizing Large-scale Geospatial Data 14 hours deck.gl is an open-source, WebGL-powered library for exploring and visualizing data assets at scale. Created by Uber, it is especially useful for gaining insights from geospatial data sources, such as data on maps. This instructor-led, live training introduces the concepts and functionality behind deck.gl and walks participants through the set up of a demonstration project. By the end of this training, participants will be able to: Take data from very large collections and turn it into compelling visual representations Visualize data collected from transportation and journey-related use cases, such as pick-up and drop-off experiences, network traffic, etc. Apply layering techniques to geospatial data to depict changes in data over time Integrate deck.gl with React (for Reactive programming) and Mapbox GL (for visualizations on Mapbox based maps). Understand and explore other use cases for deck.gl, including visualizing points collected from a 3D indoor scan, visualizing machine learning models in order to optimize their algorithms, etc. Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
singa Mastering Apache SINGA 21 hours SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many built-in layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning. Audience This course is directed at researchers, engineers and developers seeking to utilize Apache SINGA as a deep learning framework. After completing this course, delegates will: understand SINGA’s structure and deployment mechanisms be able to carry out installation / production environment / architecture tasks and configuration be able to assess code quality, perform debugging, monitoring be able to implement advanced production like training models, embedding terms, building graphs and logging   Introduction Installation Quick Start Programming NeuralNet Layer Param TrainOneBatch Updater  Distributed Training Data Preparation Checkpoint and Resume Python Binding Performance test and Feature extraction Training on GPU Examples Feed-forward models CNN MLP RBM + Auto-encoder Vanilla RNN for language modelling Char-RNN
optaprac OptaPlanner in Practice 21 hours This course uses a practical approach to teaching OptaPlanner. It provides participants with the tools needed to perform the basic functions of this tool. Planner introduction What is OptaPlanner? What is a planning problem? Use Cases and examples Bin Packaging Problem Example Problem statement Problem size Domain model diagram Main method Solver configuration Domain model implementation Score configuration Travelling Salesman Problem (TSP) Problem statement Problem size Domain model Main method Chaining Solver configuration Domain model implementation Score configuration Planner configuration Overview Solver configuration Model your planning problem Use the Solver Score calculation Score terminology Choose a Score definition Calculate the Score Score calculation performance tricks Reusing the Score calculation outside the Solver Optimization algorithms Search space size in the real world Does Planner find the optimal solution? Architecture overview Optimization algorithms overview Which optimization algorithms should I use? SolverPhase Scope overview Termination SolverEventListener Custom SolverPhase Move and neighborhood selection Move and neighborhood introduction Generic Move Selectors Combining multiple MoveSelectors EntitySelector ValueSelector General Selector features Custom moves Construction heuristics First Fit Best Fit Advanced Greedy Fit Cheapest insertion Regret insertion Local search Local Search concepts Hill Climbing (Simple Local Search) Tabu Search Simulated Annealing Late Acceptance Step counting hill climbing Late Simulated Annealing (experimental) Using a custom Termination, MoveSelector, EntitySelector, ValueSelector or Acceptor Evolutionary algorithms Evolutionary Strategies Genetic Algorithms Hyperheuristics Exact methods Brute Force Depth-first Search Benchmarking and tweaking Finding the best Solver configuration Doing a benchmark Benchmark report Summary statistics Statistics per dataset (graph and CSV) Advanced benchmarking Repeated planning Introduction to repeated planning Backup planning Continuous planning (windowed planning) Real-time planning (event based planning) Drools Short introduction to Drools Writing Score Function in Drools Integration Overview Persistent storage SOA and ESB Other environment
BigData_ A practical introduction to Data Analysis and Big Data 35 hours Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course Part lecture, part discussion, hands-on practice and implementation, occasional quizing to measure progress. Introduction to Data Analysis and Big Data What makes Big Data "big"? Velocity, Volume, Variety, Veracity (VVVV) Limits to traditional Data Processing Distributed Processing Statistical Analysis Types of Machine Learning Analysis Data Visualization Languages used for Data Analysis R language Why R for Data Analysis? Data manipulation, calculation and graphical display Python Why Python for Data Analysis? Manipulating, processing, cleaning, and crunching data Approaches to Data Analysis Statistical Analysis Time Series analysis Forecasting with Correlation and Regression models Inferential Statistics (estimating) Descriptive Statistics in Big Data sets (e.g. calculating mean) Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filtering Natural Language Processing Processing text Understaing meaning of the text Automatic text generation Sentiment analysis / Topic analysis Computer Vision Acquiring, processing, analyzing, and understanding images Reconstructing, interpreting and understanding 3D scenes Using image data to make decisions Big Data infrastructure Data Storage Relational databases (SQL) MySQL Postgres Oracle Non-relational databases (NoSQL) Cassandra MongoDB Neo4js Understanding the nuances Hierarchical databases Object-oriented databases Document-oriented databases Graph-oriented databases Other Distributed Processing Hadoop HDFS as a distributed filesystem MapReduce for distributed processing Spark All-in-one in-memory cluster computing framework for large-scale data processing Structured streaming Spark SQL Machine Learning libraries: MLlib Graph processing with GraphX Scalability Public cloud AWS, Google, Aliyun, etc. Private cloud OpenStack, Cloud Foundry, etc. Auto-scalability Choosing the right solution for the problem The future of Big Data Closing remarks
undnn Understanding Deep Neural Networks 35 hours This course begins with giving you conceptual knowledge in neural networks and generally in machine learning algorithm, deep learning (algorithms and applications). Part-1(40%) of this training is more focus on fundamentals, but will help you choosing the right technology : TensorFlow, Caffe, Theano, DeepDrive, Keras, etc. Part-2(20%) of this training introduces Theano - a python library that makes writing deep learning models easy. Part-3(40%) of the training would be extensively based on Tensorflow - 2nd Generation API of Google's open source software library for Deep Learning. The examples and handson would all be made in TensorFlow. Audience This course is intended for engineers seeking to use TensorFlow for their Deep Learning projects After completing this course, delegates will: have a good understanding on deep neural networks(DNN), CNN and RNN understand TensorFlow’s structure and deployment mechanisms be able to carry out installation / production environment / architecture tasks and configuration be able to assess code quality, perform debugging, monitoring be able to implement advanced production like training models, building graphs and logging   Not all the topics would be covered in a public classroom with 35 hours duration due to the vastness of the subject. The Duration of the complete course will be around 70 hours and not 35 hours. Part 1 – Deep Learning and DNN Concepts Introduction AI, Machine Learning & Deep Learning History, basic concepts and usual applications of artificial intelligence far Of the fantasies carried by this domain Collective Intelligence: aggregating knowledge shared by many virtual agents Genetic algorithms: to evolve a population of virtual agents by selection Usual Learning Machine: definition. Types of tasks: supervised learning, unsupervised learning, reinforcement learning Types of actions: classification, regression, clustering, density estimation, reduction of dimensionality Examples of Machine Learning algorithms: Linear regression, Naive Bayes, Random Tree Machine learning VS Deep Learning: problems on which Machine Learning remains Today the state of the art (Random Forests & XGBoosts)   Basic Concepts of a Neural Network (Application: multi-layer perceptron) Reminder of mathematical bases. Definition of a network of neurons: classical architecture, activation and Weighting of previous activations, depth of a network Definition of the learning of a network of neurons: functions of cost, back-propagation, Stochastic gradient descent, maximum likelihood. Modeling of a neural network: modeling input and output data according to The type of problem (regression, classification ...). Curse of dimensionality. Distinction between Multi-feature data and signal. Choice of a cost function according to the data. Approximation of a function by a network of neurons: presentation and examples Approximation of a distribution by a network of neurons: presentation and examples Data Augmentation: how to balance a dataset Generalization of the results of a network of neurons. Initialization and regularization of a neural network: L1 / L2 regularization, Batch Normalization ... Optimization and convergence algorithms   Standard ML / DL Tools A simple presentation with advantages, disadvantages, position in the ecosystem and use is planned. Data management tools: Apache Spark, Apache Hadoop Tools Machine Learning: Numpy, Scipy, Sci-kit DL high level frameworks: PyTorch, Keras, Lasagne Low level DL frameworks: Theano, Torch, Caffe, Tensorflow   Convolutional Neural Networks (CNN). Presentation of the CNNs: fundamental principles and applications Basic operation of a CNN: convolutional layer, use of a kernel, Padding & stride, feature map generation, pooling layers. Extensions 1D, 2D and 3D. Presentation of the different CNN architectures that brought the state of the art in classification Images: LeNet, VGG Networks, Network in Network, Inception, Resnet. Presentation of Innovations brought about by each architecture and their more global applications (Convolution 1x1 or residual connections) Use of an attention model. Application to a common classification case (text or image) CNNs for generation: super-resolution, pixel-to-pixel segmentation. Presentation of Main strategies for increasing feature maps for image generation.   Recurrent Neural Networks (RNN). Presentation of RNNs: fundamental principles and applications. Basic operation of the RNN: hidden activation, back propagation through time, Unfolded version. Evolutions towards the Gated Recurrent Units (GRUs) and LSTM (Long Short Term Memory). Presentation of the different states and the evolutions brought by these architectures Convergence and vanising gradient problems Classical architectures: Prediction of a temporal series, classification ... RNN Encoder Decoder type architecture. Use of an attention model. NLP applications: word / character encoding, translation. Video Applications: prediction of the next generated image of a video sequence. Generational models: Variational AutoEncoder (VAE) and Generative Adversarial Networks (GAN). Presentation of the generational models, link with the CNNs Auto-encoder: reduction of dimensionality and limited generation Variational Auto-encoder: generational model and approximation of the distribution of a given. Definition and use of latent space. Reparameterization trick. Applications and Limits observed Generative Adversarial Networks: Fundamentals. Dual Network Architecture (Generator and discriminator) with alternate learning, cost functions available. Convergence of a GAN and difficulties encountered. Improved convergence: Wasserstein GAN, Began. Earth Moving Distance. Applications for the generation of images or photographs, text generation, super- resolution. Deep Reinforcement Learning. Presentation of reinforcement learning: control of an agent in a defined environment By a state and possible actions Use of a neural network to approximate the state function Deep Q Learning: experience replay, and application to the control of a video game. Optimization of learning policy. On-policy && off-policy. Actor critic architecture. A3C. Applications: control of a single video game or a digital system.   Part 2 – Theano for Deep Learning Theano Basics Introduction Installation and Configuration Theano Functions inputs, outputs, updates, givens Training and Optimization of a neural network using Theano Neural Network Modeling Logistic Regression Hidden Layers Training a network Computing and Classification Optimization Log Loss Testing the model Part 3 – DNN using Tensorflow TensorFlow Basics Creation, Initializing, Saving, and Restoring TensorFlow variables Feeding, Reading and Preloading TensorFlow Data How to use TensorFlow infrastructure to train models at scale Visualizing and Evaluating models with TensorBoard TensorFlow Mechanics Prepare the Data Download Inputs and Placeholders Build the GraphS Inference Loss Training Train the Model The Graph The Session Train Loop Evaluate the Model Build the Eval Graph Eval Output The Perceptron Activation functions The perceptron learning algorithm Binary classification with the perceptron Document classification with the perceptron Limitations of the perceptron From the Perceptron to Support Vector Machines Kernels and the kernel trick Maximum margin classification and support vectors Artificial Neural Networks Nonlinear decision boundaries Feedforward and feedback artificial neural networks Multilayer perceptrons Minimizing the cost function Forward propagation Back propagation Improving the way neural networks learn Convolutional Neural Networks Goals Model Architecture Principles Code Organization Launching and Training the Model Evaluating a Model   Basic Introductions to be given to the below modules(Brief Introduction to be provided based on time availability): Tensorflow - Advanced Usage Threading and Queues Distributed TensorFlow Writing Documentation and Sharing your Model Customizing Data Readers Manipulating TensorFlow Model Files TensorFlow Serving Introduction Basic Serving Tutorial Advanced Serving Tutorial Serving Inception Model Tutorial
brmsdrools Business Rule Management (BRMS) with Drools 7 hours This course is aimed at enterprise architects, business and system analysts and managers who want to apply business rules to their solution. With Drools you can write your business rules using almost natural language, therefore reducing the gap between business and IT. Short Introduction to Rule Engines Artificial Intelligence Expert Systems What is a Rule Engine? Why use a Rule Engine? Advantages of a Rule Engine When should you use a Rule Engine? Scripting or Process Engines When you should NOT use a Rule Engine Strong and Loose Coupling What are rules? Creating and Implementing Rules Fact Model KIE Eclipse Domain Specific Language (DSL) Replacing rules with DSL Testing DSL rules jBPM Integration with Drools Fusion What is Complex Event Processing? Short overview on Fusion Rules Testing Testing with KIE Testing with JUnit Integrating Rules with Application
noolsint Introduction to Nools 7 hours Flows Defining A Flow Sessions Facts Assert Retract Modify Retrieving Facts Firing Disposing Removing A Flow Removing All Flows Checking If A Flow Exists Agenda Group Focus Auto Focus Conflict Resolution Defining Rules Structure Salience Scope Constraints Not Or From Exists Actions Async Actions Globals Import Browser Support
dsguihtml5jsre Designing Inteligent User Interface with HTML5, JavaScript and Rule Engines 21 hours Coding interfaces which allow users to get what they want easily is hard. This course guides you how to create effective UI with newest technologies and libraries. It introduces idea of coding logic in Rule Engines (mostly Nools and PHP Rules) to make it easier to modify and test. After that the courses shows a way of integrating the logic on the front end of the website using JavaScript. Logic coded this way can be reused on the backend. Writing your rules Available rule engines Stating rules in a declarative manner Extending rules Create unit tests for the rules Available test frameworks Running tests automatically Creating GUI for the rules Available frameworks GUI design principles Integrating logic with the GUI Running rules in the browser Ajax Decision tables Create functional tests for the GUI Available frameworks Testing against multiple browsers
hadoopdeva Advanced Hadoop for Developers 21 hours Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase.  These advanced programming techniques will be beneficial to experienced Hadoop developers. Audience: developers Duration: three days Format: lectures (50%) and hands-on labs (50%).   Section 1: Data Management in HDFS Various Data Formats (JSON / Avro / Parquet) Compression Schemes Data Masking Labs : Analyzing different data formats;  enabling compression Section 2: Advanced Pig User-defined Functions Introduction to Pig Libraries (ElephantBird / Data-Fu) Loading Complex Structured Data using Pig Pig Tuning Labs : advanced pig scripting, parsing complex data types Section 3 : Advanced Hive User-defined Functions Compressed Tables Hive Performance Tuning Labs : creating compressed tables, evaluating table formats and configuration Section 4 : Advanced HBase Advanced Schema Modelling Compression Bulk Data Ingest Wide-table / Tall-table comparison HBase and Pig HBase and Hive HBase Performance Tuning Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
cntk Using Computer Network ToolKit (CNTK) 28 hours Computer Network ToolKit (CNTK) is Microsoft's Open Source, Multi-machine, Multi-GPU, Highly efficent RNN training machine learning framework for speech, text, and images. Audience This course is directed at engineers and architects aiming to utilize CNTK in their projects. Getting started Setup CNTK on your machine Enabling 1bit SGD Developing and Testing CNTK Production Test Configurations How to contribute to CNTK Tutorial Tutorial II CNTK usage overview Examples Presentations Multiple GPUs¹ and machines Configuring CNTK Config file overview Simple Network Builder BrainScript Network Builder SGD block Reader block Train, Test, Eval Top-level configurations Describing Networks Basic concepts Expressions Defining functions Full Function Reference Data readers Text Format Reader CNTK Text Format Reader UCI Fast Reader (deprecated) HTKMLF Reader LM sequence reader LU sequence reader Image reader Evaluating CNTK Models Overview C++ Evaluation Interface C# Evaluation Interface Evaluating Hidden Layers C# Image Transforms for Evaluation Advanced topics Command line parsing rules Top-level commands Plot command ConvertDBN command ¹ The topic related to the use of CNTK with a GPU is not available as a part of a remote course. This module can be delivered during classroom-based courses, but only by prior agreement, and only if both the trainer and all participants have laptops with supported NVIDIA GPUs (not provided by NobleProg). NobleProg cannot guarantee the availability of trainers with the required hardware.
flink Flink for scalable stream and batch data processing 28 hours To request a customized course outline for this training, please contact us.  
embeddingprojector Embedding Projector: Visualizing your Training Data 14 hours Embedding Projector is an open-source web application for visualizing the data used to train machine learning systems. Created by Google, it is part of TensorFlow. This instructor-led, live training introduces the concepts behind Embedding Projector and walks participants through the setup of a demo project. By the end of this training, participants will be able to: Explore how data is being interpreted by machine learning models Navigate through 3D and 2D views of data to understand how a machine learning algorithm interprets it Understand the concepts behind Embeddings and their role in representing mathematical vectors for images, words and numerals. Explore the properties of a specific embedding to understand the behavior of a model Apply Embedding Project to real-world use cases such building a song recommendation system for music lovers Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
dl4j Mastering Deeplearning4j 21 hours Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs.   Audience This course is directed at engineers and developers seeking to utilize Deeplearning4j in their projects.   After this course delegates will be able to: Getting Started Quickstart: Running Examples and DL4J in Your Projects Comprehensive Setup Guide Introduction to Neural Networks Restricted Boltzmann Machines Convolutional Nets (ConvNets) Long Short-Term Memory Units (LSTMs) Denoising Autoencoders Recurrent Nets and LSTMs Multilayer Neural Nets Deep-Belief Network Deep AutoEncoder Stacked Denoising Autoencoders Tutorials Using Recurrent Nets in DL4J MNIST DBN Tutorial Iris Flower Tutorial Canova: Vectorization Lib for ML Tools Neural Net Updaters: SGD, Adam, Adagrad, Adadelta, RMSProp Datasets Datasets and Machine Learning Custom Datasets CSV Data Uploads Scaleout Iterative Reduce Defined Multiprocessor / Clustering Running Worker Nodes Text DL4J's NLP Framework Word2vec for Java and Scala Textual Analysis and DL Bag of Words Sentence and Document Segmentation Tokenization Vocab Cache Advanced DL2J Build Locally From Master Contribute to DL4J (Developer Guide) Choose a Neural Net Use the Maven Build Tool Vectorize Data With Canova Build a Data Pipeline Run Benchmarks Configure DL4J in Ivy, Gradle, SBT etc Find a DL4J Class or Method Save and Load Models Interpret Neural Net Output Visualize Data with t-SNE Swap CPUs for GPUs Customize an Image Pipeline Perform Regression With Neural Nets Troubleshoot Training & Select Network Hyperparameters Visualize, Monitor and Debug Network Learning Speed Up Spark With Native Binaries Build a Recommendation Engine With DL4J Use Recurrent Networks in DL4J Build Complex Network Architectures with Computation Graph Train Networks using Early Stopping Download Snapshots With Maven Customize a Loss Function
datamodeling Pattern Recognition 35 hours This course provides an introduction into the field of pattern recognition and machine learning. It touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. The course is interactive and includes plenty of hands-on exercises, instructor feedback, and testing of knowledge and skills acquired. Audience     Data analysts     PhD students, researchers and practitioners   Introduction Probability theory, model selection, decision and information theory Probability distributions Linear models for regression and classification Neural networks Kernel methods Sparse kernel machines Graphical models Mixture models and EM Approximate inference Sampling methods Continuous latent variables Sequential data Combining models  
octnp Octave not only for programmers 21 hours Course is dedicated for those who would like to know an alternative program to the commercial MATLAB package. The three-day training provides comprehensive information on moving around the environment and performing the OCTAVE package for data analysis and engineering calculations. The training recipients are beginners but also those who know the program and would like to systematize their knowledge and improve their skills. Knowledge of other programming languages is not required, but it will greatly facilitate the learners' acquisition of knowledge. The course will show you how to use the program in many practical examples. Introduction Simple calculations Starting Octave, Octave as a calculator, built-in functions The Octave environment Named variables, numbers and formatting, number representation and accuracy, loading and saving data  Arrays and vectors Extracting elements from a vector, vector maths Plotting graphs Improving the presentation, multiple graphs and figures, saving and printing figures Octave programming I: Script files Creating and editing a script, running and debugging scripts, Control statements If else, switch, for, while Octave programming II: Functions Matrices and vectors Matrix, the transpose operator, matrix creation functions, building composite matrices, matrices as tables, extracting bits of matrices, basic matrix functions Linear and Nonlinear Equations More graphs Putting several graphs in one window, 3D plots, changing the viewpoint, plotting surfaces, images and movies,  Eigenvectors and the Singular Value Decomposition  Complex numbers Plotting complex numbers,  Statistics and data processing  GUI Development
neo4j Beyond the relational database: neo4j 21 hours Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly complex queries on the data. Imagine replacing rows-and-columns-based data storage with object-based data storage, whereby entities (e.g., a person) could be stored as data nodes, then easily queried on the basis of their vast, multi-linear relationship with other nodes. And imagine querying these connections and their associated objects and properties using a compact syntax, up to 20 times lighter than SQL. This is what graph databases, such as neo4j offer. In this hands-on course, we will set up a live project and put into practice the skills to model, manage and access your data. We contrast and compare graph databases with SQL-based databases as well as other NoSQL databases and clarify when and where it makes sense to implement each within your infrastructure. Audience Database administrators (DBAs) Data analysts Developers System Administrators DevOps engineers Business Analysts CTOs CIOs Format of the course Heavy emphasis on hands-on practice. Most of the concepts are learned through samples, exercises and hands-on development. Getting started with neo4j neo4j vs relational databases neo4j vs other NoSQL databases Using neo4j to solve real world problems Installing neo4j Data modeling with neo4j Mapping white-board diagrams and mind maps to neo4j Working with nodes Creating, changing and deleting nodes Defining node properties Node relationships Creating and deleting relationships Bi-directional relationships Querying your data with Cypher Querying your data based on relationships MATCH, RETURN, WHERE, REMOVE, MERGE, etc. Setting indexes and constraints Working with the REST API REST operations on nodes REST operations on relationships REST operations on indexes and constraints Accessing the core API for application development Working with NET, Java, Javascript, and Python APIs Closing remarks  
aiint Artificial Intelligence Overview 7 hours This course has been created for managers, solutions architects, innovation officers, CTOs, software architects and everyone who is interested overview of applied artificial intelligence and the nearest forecast for its development. Artificial Intelligence History Intelligent Agents Problem Solving Solving Problems by Searching Beyond Classical Search Adversarial Search Constraint Satisfaction Problems Knowledge and Reasoning Logical Agents First-Order Logic Inference in First-Order Logic Classical Planning Planning and Acting in the Real World Knowledge Representation Uncertain Knowledge and Reasoning Quantifying Uncertainty Probabilistic Reasoning Probabilistic Reasoning over Time Making Simple Decisions Making Complex Decisions Learning Learning from Examples Knowledge in Learning Learning Probabilistic Models Reinforcement Learning Communicating, Perceiving, and Acting; Natural Language Processing Natural Language for Communication Perception Robotics Conclusions Philosophical Foundations AI: The Present and Future
bdbiga Big Data Business Intelligence for Govt. Agencies 35 hours Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information. High-value government solutions will be created from a mashup of the most disruptive technologies: Mobile devices and applications Cloud services Social business technologies and networking Big Data and analytics IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured. But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of thesevolumes of Big Datarequires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog. The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it. The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge. Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billion while also fulfilling mission objectives.). Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers. Each session is 2 hours Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Govt. Case Studies from NIH, DoE Big Data adaptation rate in Govt. Agencies & and how they are aligning their future operation around Big Data Predictive Analytics Broad Scale Application Area in DoD, NSA, IRS, USDA etc. Interfacing Big Data with Legacy data Basic understanding of enabling technologies in predictive analytics Data Integration & Dashboard visualization Fraud management Business Rule/ Fraud detection generation Threat detection and profiling Cost benefit analysis for Big Data implementation Day-1: Session-2 : Introduction of Big Data-1 Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume. Data Warehouses – static schema, slowly evolving dataset MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc. Hadoop Based Solutions – no conditions on structure of dataset. Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS Batch- suited for analytical/non-interactive Volume : CEP streaming data Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc) Less production ready – Storm/S4 NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database Day-1 : Session -3 : Introduction to Big Data-2 NoSQL solutions KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB) KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB KV Store (Hierarchical) - GT.m, Cache KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua Tuple Store - Gigaspaces, Coord, Apache River Object Database - ZopeDB, DB40, Shoal Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI Varieties of Data: Introduction to Data Cleaning issue in Big Data RDBMS – static structure/schema, doesn’t promote agile, exploratory environment. NoSQL – semi structured, enough structure to store data without exact schema before storing data Data cleaning issues Day-1 : Session-4 : Big Data Introduction-3 : Hadoop When to select Hadoop? STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration) SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB) Warehousing data = HUGE effort and static even after implementation For variety & volume of data, crunched on commodity hardware – HADOOP Commodity H/W needed to create a Hadoop Cluster Introduction to Map Reduce /HDFS MapReduce – distribute computing over multiple servers HDFS – make data available locally for the computing process (with redundancy) Data – can be unstructured/schema-less (unlike RDBMS) Developer responsibility to make sense of data Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when? Hadoop vs. Other NoSQL solutions For interactive, random access to data Hbase (column oriented database) on top of Hadoop Random access to data but restrictions imposed (max 1 PB) Not good for ad-hoc analytics, good for logging, counting, time-series Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access) Flume – Stream data (e.g. log data) into HDFS Day-2: Session-2: Big Data Management System Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari In Cloud : Whirr Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI : Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Support Vector Machine KNN p-Tree Algebra & vertical mining Neural Network Big Data large variable problem -Random forest (RF) Big Data Automation problem – Multi-model ensemble RF Automation through Soft10-M Text analytic tool-Treeminer Agile learning Agent based learning Distributed learning Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Govt. Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Threat/fraudstar/vendor profiling Recommendation Engine Pattern detection Rule/Scenario discovery –failure, fraud, optimization Root cause discovery Sentiment analysis CRM analytic Network analytic Text Analytics Technology assisted review Fraud analytic Real Time Analytic Day-3 : Sesion-1 : Real Time and Scalable Analytic Over Hadoop Why common analytic algorithms fail in Hadoop/HDFS Apache Hama- for Bulk Synchronous distributed computing Apache SPARK- for cluster computing for real time analytic CMU Graphics Lab2- Graph based asynchronous approach to distributed computing KNN p-Algebra based approach from Treeminer for reduced hardware cost of operation Day-3: Session-2: Tools for eDiscovery and Forensics eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance Predictive coding and technology assisted review (TAR) Live demo of a Tar product ( vMiner) to understand how TAR works for faster discovery Faster indexing through HDFS –velocity of data NLP or Natural Language processing –various techniques and open source products eDiscovery in foreign languages-technology for foreign language processing Day-3 : Session 3: Big Data BI for Cyber Security –Understanding whole 360 degree views of speedy data collection to threat identification Understanding basics of security analytics-attack surface, security misconfiguration, host defenses Network infrastructure/ Large datapipe / Response ETL for real time analytic Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data Day-3: Session 4: Big Data in USDA : Application in Agriculture Introduction to IoT ( Internet of Things) for agriculture-sensor based Big Data and control Introduction to Satellite imaging and its application in agriculture Integrating sensor and image data for fertility of soil, cultivation recommendation and forecasting Agriculture insurance and Big Data Crop Loss forecasting Day-4 : Session-1: Fraud prevention BI from Big Data in Govt-Fraud analytic: Basic classification of Fraud analytics- rule based vs predictive analytics Supervised vs unsupervised Machine learning for Fraud pattern detection Vendor fraud/over charging for projects Medicare and Medicaid fraud- fraud detection techniques for claim processing Travel reimbursement frauds IRS refund frauds Case studies and live demo will be given wherever data is available. Day-4 : Session-2: Social Media Analytic- Intelligence gathering and analysis Big Data ETL API for extracting social media data Text, image, meta data and video Sentiment analysis from social media feed Contextual and non-contextual filtering of social media feed Social Media Dashboard to integrate diverse social media Automated profiling of social media profile Live demo of each analytic will be given through Treeminer Tool. Day-4 : Session-3: Big Data Analytic in image processing and video feeds Image Storage techniques in Big Data- Storage solution for data exceeding petabytes LTFS and LTO GPFS-LTFS ( Layered storage solution for Big image data) Fundamental of image analytics Object recognition Image segmentation Motion tracking 3-D image reconstruction Day-4: Session-4: Big Data applications in NIH: Emerging areas of Bio-informatics Meta-genomics and Big Data mining issues Big Data Predictive analytic for Pharmacogenomics, Metabolomics and Proteomics Big Data in downstream Genomics process Application of Big data predictive analytics in Public health Big Data Dashboard for quick accessibility of diverse data and display : Integration of existing application platform with Big Data Dashboard Big Data management Case Study of Big Data Dashboard: Tableau and Pentaho Use Big Data app to push location based services in Govt. Tracking system and management Day-5 : Session-1: How to justify Big Data BI implementation within an organization: Defining ROI for Big Data implementation Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain Case studies of revenue gain from saving the licensed database cost Revenue gain from location based services Saving from fraud prevention An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation. Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System: Understanding practical Big Data Migration Roadmap What are the important information needed before architecting a Big Data implementation What are the different ways of calculating volume, velocity, variety and veracity of data How to estimate data growth Case studies Day-5: Session 4: Review of Big Data Vendors and review of their products. Q/A session: Accenture APTEAN (Formerly CDC Software) Cisco Systems Cloudera Dell EMC GoodData Corporation Guavus Hitachi Data Systems Hortonworks HP IBM Informatica Intel Jaspersoft Microsoft MongoDB (Formerly 10Gen) MU Sigma Netapp Opera Solutions Oracle Pentaho Platfora Qliktech Quantum Rackspace Revolution Analytics Salesforce SAP SAS Institute Sisense Software AG/Terracotta Soft10 Automation Splunk Sqrrl Supermicro Tableau Software Teradata Think Big Analytics Tidemark Systems Treeminer VMware (Part of EMC)
hadoopforprojectmgrs Hadoop for Project Managers 14 hours As more and more software and IT projects migrate from local processing and data management to distributed processing and big data storage, Project Managers are finding the need to upgrade their knowledge and skills to grasp the concepts and practices relevant to Big Data projects and opportunities. This course introduces Project Managers to the most popular Big Data processing framework: Hadoop.   In this instructor-led training, participants will learn the core components of the Hadoop ecosystem and how these technologies can used to solve large-scale problems. In learning these foundations, participants will also improve their ability to communicate with the developers and implementers of these systems as well as the data scientists and analysts that many IT projects involve. Audience Project Managers wishing to implement Hadoop into their existing development or IT infrastructure Project Managers needing to communicate with cross-functional teams that include big data engineers, data scientists and business analysts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction     Why and how project teams adopt Hadoop.     How it all started     The Project Manager's role in Hadoop projects Understanding Hadoop's architecture and key concepts     HDFS     MapReduce     Other pieces of the Hadoop ecosystem What constitutes Big Data? Different approaches to storing Big Data HDFS (Hadoop Distributed File System) as the foundation How Big Data is processed     The power of distributed processing Processing data with Map Reduce     How data is picked apart step by step The role of clustering in large-scale distributed processing     Architectural overview     Clustering approaches Clustering your data and processes with YARN The role of non-relational database in Big Data storage Working with Hadoop's non-relational database: HBase Data warehousing architectural overview Managing your data warehouse with Hive Running Hadoop from shell-scripts Working with Hadoop Streaming Other Hadoop tools and utilities Getting started on a Hadoop project     Demystifying complexity Migrating an existing project to Hadoop     Infrastructure considerations     Scaling beyond your allocated resources Hadoop project stakeholders and their toolkits     Developers, data scientists, business analysts and project managers Hadoop as a foundation for new technologies and approaches Closing remarks
hadoopdev Hadoop for Developers (4 days) 28 hours Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.   Section 1: Introduction to Hadoop hadoop history, concepts eco system distributions high level architecture hadoop myths hadoop challenges hardware / software lab : first look at Hadoop Section 2: HDFS Design and architecture concepts (horizontal scaling, replication, data locality, rack awareness) Daemons : Namenode, Secondary namenode, Data node communications / heart-beats data integrity read / write path Namenode High Availability (HA), Federation labs : Interacting with HDFS Section 3 : Map Reduce concepts and architecture daemons (MRV1) : jobtracker / tasktracker phases : driver, mapper, shuffle/sort, reducer Map Reduce Version 1 and Version 2 (YARN) Internals of Map Reduce Introduction to Java Map Reduce program labs : Running a sample MapReduce program Section 4 : Pig pig vs java map reduce pig job flow pig latin language ETL with Pig Transformations & Joins User defined functions (UDF) labs : writing Pig scripts to analyze data Section 5: Hive architecture and design data types SQL support in Hive Creating Hive tables and querying partitions joins text processing labs : various labs on processing data with Hive Section 6: HBase concepts and architecture hbase vs RDBMS vs cassandra HBase Java API Time series data on HBase schema design labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise
systemml Apache SystemML for Machine Learning 14 hours Apache SystemML is a distributed and declarative machine learning platform. SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations on Apache Hadoop and Apache Spark. Audience This course is suitable for Machine Learning researchers, developers and engineers seeking to utilize SystemML as a framework for machine learning. Running SystemML Standalone Spark MLContext Spark Batch Hadoop Batch JMLC Tools Debugger IDE Troubleshooting Languages and ML Algorithms DML PyDML Algorithms
apex Apache Apex: Processing big data-in-motion 21 hours Apache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable. This instructor-led, live training introduces Apache Apex's unified stream processing architecture and walks participants through the creation of a distributed application using Apex on Hadoop. By the end of this training, participants will be able to: Understand data processing pipeline concepts such as connectors for sources and sinks, common data transformations, etc. Build, scale and optimize an Apex application Process real-time data streams reliably and with minimum latency Use Apex Core and the Apex Malhar library to enable rapid application development Use the Apex API to write and re-use existing Java code Integrate Apex into other applications as a processing engine Tune, test and scale Apex applications Audience Developers Enterprise architects Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
openface OpenFace: Creating Facial Recognition Systems 14 hours OpenFace is Python and Torch based open-source, real-time facial recognition software based on Google’s FaceNet research. In this instructor-led, live training, participants will learn how to use OpenFace's components to create and deploy a sample facial recognition application. By the end of this training, participants will be able to: Work with OpenFace's components, including dlib, OpenVC, Torch, and nn4 to implement face detection, alignment, and transformation. Apply OpenFace to real-world applications such as surveillance, identity verification, virtual reality, gaming, and identifying repeat customers, etc. Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
Neuralnettf Neural Networks Fundamentals using TensorFlow as Example 28 hours This course will give you knowledge in neural networks and generally in machine learning algorithm,  deep learning (algorithms and applications). This training is more focus on fundamentals, but will help you choosing the right technology : TensorFlow, Caffe, Teano, DeepDrive, Keras, etc. The examples are made in TensorFlow. TensorFlow Basics Creation, Initializing, Saving, and Restoring TensorFlow variables Feeding, Reading and Preloading TensorFlow Data How to use TensorFlow infrastructure to train models at scale Visualizing and Evaluating models with TensorBoard TensorFlow Mechanics Inputs and Placeholders Build the GraphS Inference Loss Training Train the Model The Graph The Session Train Loop Evaluate the Model Build the Eval Graph Eval Output The Perceptron Activation functions The perceptron learning algorithm Binary classification with the perceptron Document classification with the perceptron Limitations of the perceptron From the Perceptron to Support Vector Machines Kernels and the kernel trick Maximum margin classification and support vectors Artificial Neural Networks Nonlinear decision boundaries Feedforward and feedback artificial neural networks Multilayer perceptrons Minimizing the cost function Forward propagation Back propagation Improving the way neural networks learn Convolutional Neural Networks Goals Model Architecture Principles Code Organization Launching and Training the Model Evaluating a Model
processmining Process Mining 21 hours Process mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with processes and provides insights into the trends and patterns that affect process efficiency.  Format of the course     The course starts with an overview of the most commonly used techniques for process mining. We discuss the various process discovery algorithms and tools used for discovering and modeling processes based on raw event data. Real-life case studies are examined and data sets are analyzed using the ProM open-source framework. Audience     Data science professionals     Anyone interested in understanding and applying process modeling and data mining Overview     Discovering, analyzing and re-thinking your processes Types of process mining     Discovery, conformance and enhancement Process mining workflow     From log data analysis to response and action Other tools for process mining     PMLAB, Apromoro     Commercial offerings Closing remarks
dsstne Amazon DSSTNE: Build a recommendation system 7 hours Amazon DSSTNE is an open-source library for training and deploying recommendation models. It allows models with weight matrices that are too large for a single GPU to be trained on a single host. In this instructor-led, live training, participants will learn how to use DSSTNE to build a recommendation application. By the end of this training, participants will be able to: Train a recommendation model with sparse datasets as input Scale training and prediction models over multiple GPUs Spread out computation and storage in a model-parallel fashion Generate Amazon-like personalized product recommendations Deploy a production-ready application that can scale at heavy workloads Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
d3js D3.js for Data Visualization 7 hours D3.js (or D3 for Data-Driven Documents) is a JavaScript library that uses SVG, HTML5, and CSS for producing dynamic, interactive data visualizations in web browsers. In this instructor-led, live training, participants will learn how to create web-based data-driven visualizations that run on multiple devices responsively. By the end of this training, participants will be able to: Use D3 to create interactive graphics, information dashboards, infographics and maps Control HTML with jQuery-like selections Transform the DOM by selecting elements and joining to data Export SVG for use in print publications Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction Overview of the data visualization process Data visualization components: HTML, CSS, Javascript, DOM, D3, SVG D3 methods: scaling, events, transitions, and animations Attaching your data to DOM (Document Object Model) elements Using CSS3, HTML, and/or SVG to showcase data Making data interactive with D3.js data-driven transformations and transitions Working with layouts Exporting SVG Closing remarks
smtwebint Semantic Web Overview 7 hours The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Semantic Web Overview Introduction Purpose Standards Ontology Projects Resource Description Framework (RDF) Introduction Motivation and Goals RDF Concepts RDF Vocabulary URI and Namespace (Normative) Datatypes (Normative) Abstract Syntax (Normative) Fragment Identifiers
bdbitcsp Big Data Business Intelligence for Telecom and Communication Service Providers 35 hours Overview Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow at a compound annual growth rate (CAGR) of 78 percent to 2016, reaching 10.8 exabytes per month. Meanwhile, CSPs are generating large volumes of data, including call detail records (CDR), network data and customer data. Companies that fully exploit this data gain a competitive edge. According to a recent survey by The Economist Intelligence Unit, companies that use data-directed decision-making enjoy a 5-6% boost in productivity. Yet 53% of companies leverage only half of their valuable data, and one-fourth of respondents noted that vast quantities of useful data go untapped. The data volumes are so high that manual analysis is impossible, and most legacy software systems can’t keep up, resulting in valuable data being discarded or ignored. With Big Data & Analytics’ high-speed, scalable big data software, CSPs can mine all their data for better decision making in less time. Different Big Data products and techniques provide an end-to-end software platform for collecting, preparing, analyzing and presenting insights from big data. Application areas include network performance monitoring, fraud detection, customer churn detection and credit risk analysis. Big Data & Analytics products scale to handle terabytes of data but implementation of such tools need new kind of cloud based database system like Hadoop or massive scale parallel computing processor ( KPU etc.) This course work on Big Data BI for Telco covers all the emerging new areas in which CSPs are investing for productivity gain and opening up new business revenue stream. The course will provide a complete 360 degree over view of Big Data BI in Telco so that decision makers and managers can have a very wide and comprehensive overview of possibilities of Big Data BI in Telco for productivity and revenue gain. Course objectives Main objective of the course is to introduce new Big Data business intelligence techniques in 4 sectors of Telecom Business (Marketing/Sales, Network Operation, Financial operation and Customer Relation Management). Students will be introduced to following: Introduction to Big Data-what is 4Vs (volume, velocity, variety and veracity) in Big Data- Generation, extraction and management from Telco perspective How Big Data analytic differs from legacy data analytic In-house justification of Big Data -Telco perspective Introduction to Hadoop Ecosystem- familiarity with all Hadoop tools like Hive, Pig, SPARC –when and how they are used to solve Big Data problem How Big Data is extracted to analyze for analytics tool-how Business Analysis’s can reduce their pain points of collection and analysis of data through integrated Hadoop dashboard approach Basic introduction of Insight analytics, visualization analytics and predictive analytics for Telco Customer Churn analytic and Big Data-how Big Data analytic can reduce customer churn and customer dissatisfaction in Telco-case studies Network failure and service failure analytics from Network meta-data and IPDR Financial analysis-fraud, wastage and ROI estimation from sales and operational data Customer acquisition problem-Target marketing, customer segmentation and cross-sale from sales data Introduction and summary of all Big Data analytic products and where they fit into Telco analytic space Conclusion-how to take step-by-step approach to introduce Big Data Business Intelligence in your organization Target Audience Network operation, Financial Managers, CRM managers and top IT managers in Telco CIO office. Business Analysts in Telco CFO office managers/analysts Operational managers QA managers Breakdown of topics on daily basis: (Each session is 2 hours) Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Telco. Case Studies from T-Mobile, Verizon etc. Big Data adaptation rate in North American Telco & and how they are aligning their future business model and operation around Big Data BI Broad Scale Application Area Network and Service management Customer Churn Management Data Integration & Dashboard visualization Fraud management Business Rule generation Customer profiling Localized Ad pushing Day-1: Session-2 : Introduction of Big Data-1 Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume. Data Warehouses – static schema, slowly evolving dataset MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc. Hadoop Based Solutions – no conditions on structure of dataset. Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS Batch- suited for analytical/non-interactive Volume : CEP streaming data Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc) Less production ready – Storm/S4 NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database Day-1 : Session -3 : Introduction to Big Data-2 NoSQL solutions KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB) KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB KV Store (Hierarchical) - GT.m, Cache KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua Tuple Store - Gigaspaces, Coord, Apache River Object Database - ZopeDB, DB40, Shoal Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI Varieties of Data: Introduction to Data Cleaning issue in Big Data RDBMS – static structure/schema, doesn’t promote agile, exploratory environment. NoSQL – semi structured, enough structure to store data without exact schema before storing data Data cleaning issues Day-1 : Session-4 : Big Data Introduction-3 : Hadoop When to select Hadoop? STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration) SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB) Warehousing data = HUGE effort and static even after implementation For variety & volume of data, crunched on commodity hardware – HADOOP Commodity H/W needed to create a Hadoop Cluster Introduction to Map Reduce /HDFS MapReduce – distribute computing over multiple servers HDFS – make data available locally for the computing process (with redundancy) Data – can be unstructured/schema-less (unlike RDBMS) Developer responsibility to make sense of data Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS Day-2: Session-1.1: Spark : In Memory distributed database What is “In memory” processing? Spark SQL Spark SDK Spark API RDD Spark Lib Hanna How to migrate an existing Hadoop system to Spark Day-2 Session -1.2: Storm -Real time processing in Big Data Streams Sprouts Bolts Topologies Day-2: Session-2: Big Data Management System Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari In Cloud : Whirr Evolving Big Data platform tools for tracking ETL layer application issues Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI : Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Markov random field Supervised and unsupervised learning Feature extraction Support Vector Machine Neural Network Reinforcement learning Big Data large variable problem -Random forest (RF) Representation learning Deep learning Big Data Automation problem – Multi-model ensemble RF Automation through Soft10-M LDA and topic modeling Agile learning Agent based learning- Example from Telco operation Distributed learning –Example from Telco operation Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut More scalable Analytic-Apache Hama, Spark and CMU Graph lab Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Telecom Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Customer profiling Recommendation Engine Pattern detection Rule/Scenario discovery –failure, fraud, optimization Root cause discovery Sentiment analysis CRM analytic Network analytic Text Analytics Technology assisted review Fraud analytic Real Time Analytic Day-3 : Sesion-1 : Network Operation analytic- root cause analysis of network failures, service interruption from meta data, IPDR and CRM: CPU Usage Memory Usage QoS Queue Usage Device Temperature Interface Error IoS versions Routing Events Latency variations Syslog analytics Packet Loss Load simulation Topology inference Performance Threshold Device Traps IPDR ( IP detailed record) collection and processing Use of IPDR data for Subscriber Bandwidth consumption, Network interface utilization, modem status and diagnostic HFC information Day-3: Session-2: Tools for Network service failure analysis: Network Summary Dashboard: monitor overall network deployments and track your organization's key performance indicators Peak Period Analysis Dashboard: understand the application and subscriber trends driving peak utilization, with location-specific granularity Routing Efficiency Dashboard: control network costs and build business cases for capital projects with a complete understanding of interconnect and transit relationships Real-Time Entertainment Dashboard: access metrics that matter, including video views, duration, and video quality of experience (QoE) IPv6 Transition Dashboard: investigate the ongoing adoption of IPv6 on your network and gain insight into the applications and devices driving trends Case-Study-1: The Alcatel-Lucent Big Network Analytics (BNA) Data Miner Multi-dimensional mobile intelligence (m.IQ6) Day-3 : Session 3: Big Data BI for Marketing/Sales –Understanding sales/marketing from Sales data: ( All of them will be shown with a live predictive analytic demo ) To identify highest velocity clients To identify clients for a given products To identify right set of products for a client ( Recommendation Engine) Market segmentation technique Cross-Sale and upsale technique Client segmentation technique Sales revenue forecasting technique Day-3: Session 4: BI needed for Telco CFO office: Overview of Business Analytics works needed in a CFO office Risk analysis on new investment Revenue, profit forecasting New client acquisition forecasting Loss forecasting Fraud analytic on finances ( details next session ) Day-4 : Session-1: Fraud prevention BI from Big Data in Telco-Fraud analytic: Bandwidth leakage / Bandwidth fraud Vendor fraud/over charging for projects Customer refund/claims frauds Travel reimbursement frauds Day-4 : Session-2: From Churning Prediction to Churn Prevention: 3 Types of Churn : Active/Deliberate , Rotational/Incidental, Passive Involuntary 3 classification of churned customers: Total, Hidden, Partial Understanding CRM variables for churn Customer behavior data collection Customer perception data collection Customer demographics data collection Cleaning CRM Data Unstructured CRM data ( customer call, tickets, emails) and their conversion to structured data for Churn analysis Social Media CRM-new way to extract customer satisfaction index Case Study-1 : T-Mobile USA: Churn Reduction by 50% Day-4 : Session-3: How to use predictive analysis for root cause analysis of customer dis-satisfaction : Case Study -1 : Linking dissatisfaction to issues – Accounting, Engineering failures like service interruption, poor bandwidth service Case Study-2: Big Data QA dashboard to track customer satisfaction index from various parameters such as call escalations, criticality of issues, pending service interruption events etc. Day-4: Session-4: Big Data Dashboard for quick accessibility of diverse data and display : Integration of existing application platform with Big Data Dashboard Big Data management Case Study of Big Data Dashboard: Tableau and Pentaho Use Big Data app to push location based Advertisement Tracking system and management Day-5 : Session-1: How to justify Big Data BI implementation within an organization: Defining ROI for Big Data implementation Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain Case studies of revenue gain from customer churn Revenue gain from location based and other targeted Ad An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation. Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System: Understanding practical Big Data Migration Roadmap What are the important information needed before architecting a Big Data implementation What are the different ways of calculating volume, velocity, variety and veracity of data How to estimate data growth Case studies in 2 Telco Day-5: Session 3 & 4: Review of Big Data Vendors and review of their products. Q/A session: AccentureAlcatel-Lucent Amazon –A9 APTEAN (Formerly CDC Software) Cisco Systems Cloudera Dell EMC GoodData Corporation Guavus Hitachi Data Systems Hortonworks Huawei HP IBM Informatica Intel Jaspersoft Microsoft MongoDB (Formerly 10Gen) MU Sigma Netapp Opera Solutions Oracle Pentaho Platfora Qliktech Quantum Rackspace Revolution Analytics Salesforce SAP SAS Institute Sisense Software AG/Terracotta Soft10 Automation Splunk Sqrrl Supermicro Tableau Software Teradata Think Big Analytics Tidemark Systems VMware (Part of EMC)
datashrinkgov Data Shrinkage for Government 14 hours Why shrink data Relational databases Introduction Aggregation and disaggregation Normalisation and denormalisation Null values and zeroes Joining data Complex joins Cluster analysis Applications Strengths and weaknesses Measuring distance Hierarchical clustering K-means and derivatives Applications in Government Factor analysis Concepts Exploratory factor analysis Confirmatory factor analysis Principal component analysis Correspondence analysis Software Applications in Government Predictive analytics Timelines and naming conventions Holdout samples Weights of evidence Information value Scorecard building demonstration using a spreadsheet Regression in predictive analytics Logistic regression in predictive analytics Decision Trees in predictive analytics Neural networks Measuring accuracy Applications in Government
hbasedev HBase for Developers 21 hours This course introduces HBase – a NoSQL store on top of Hadoop.  The course is intended for developers who will be using HBase to develop applications,  and administrators who will manage HBase clusters. We will walk a developer through HBase architecture and data modelling and application development on HBase. It will also discuss using MapReduce with HBase, and some administration topics, related to performance optimization. The course  is very  hands-on with lots of lab exercises. Duration : 3 days Audience : Developers  & Administrators Section 1: Introduction to Big Data & NoSQL Big Data ecosystem NoSQL overview CAP theorem When is NoSQL appropriate Columnar storage HBase and NoSQL Section 2 : HBase Intro Concepts and Design Architecture (HMaster and Region Server) Data integrity HBase ecosystem Lab : Exploring HBase Section 3 : HBase Data model Namespaces, Tables and Regions Rows, columns, column families, versions HBase Shell and Admin commands Lab : HBase Shell Section 3 : Accessing HBase using Java API Introduction to Java API Read / Write path Time Series data Scans Map Reduce Filters Counters Co-processors Labs (multiple) : Using HBase Java API to implement  time series , Map Reduce, Filters and counters. Section 4 : HBase schema Design : Group session students are presented with real world use cases students work in groups to come up with design solutions discuss / critique and learn from multiple designs Labs : implement a scenario in HBase Section 5 : HBase Internals Understanding HBase under the hood Memfile / HFile / WAL HDFS storage Compactions Splits Bloom Filters Caches Diagnostics Section 6 : HBase installation and configuration hardware selection install methods common configurations Lab : installing HBase Section 7 : HBase eco-system developing applications using HBase interacting with other Hadoop stack (MapReduce, Pig, Hive) frameworks around HBase advanced concepts (co-processors) Labs : writing HBase applications Section 8 : Monitoring And Best Practices monitoring tools and practices optimizing HBase HBase in the cloud real world use cases of HBase Labs : checking HBase vitals
tfir TensorFlow for Image Recognition 28 hours This course explores, with specific examples, the application of Tensor Flow to the purposes of image recognition Audience This course is intended for engineers seeking to utilize TensorFlow for the purposes of Image Recognition After completing this course, delegates will be able to: understand TensorFlow’s structure and deployment mechanisms carry out installation / production environment / architecture tasks and configuration assess code quality, perform debugging, monitoring implement advanced production like training models, building graphs and logging Machine Learning and Recursive Neural Networks (RNN) basics NN and RNN Backprogation Long short-term memory (LSTM) TensorFlow Basics Creation, Initializing, Saving, and Restoring TensorFlow variables Feeding, Reading and Preloading TensorFlow Data How to use TensorFlow infrastructure to train models at scale Visualizing and Evaluating models with TensorBoard TensorFlow Mechanics 101 Tutorial Files Prepare the Data Download Inputs and Placeholders Build the Graph Inference Loss Training Train the Model The Graph The Session Train Loop Evaluate the Model Build the Eval Graph Eval Output Advanced Usage Threading and Queues Distributed TensorFlow Writing Documentation and Sharing your Model Customizing Data Readers Using GPUs¹ Manipulating TensorFlow Model Files TensorFlow Serving Introduction Basic Serving Tutorial Advanced Serving Tutorial Serving Inception Model Tutorial Convolutional Neural Networks Overview Goals Highlights of the Tutorial Model Architecture Code Organization CIFAR-10 Model Model Inputs Model Prediction Model Training Launching and Training the Model Evaluating a Model Training a Model Using Multiple GPU Cards¹ Placing Variables and Operations on Devices Launching and Training the Model on Multiple GPU cards Deep Learning for MNIST Setup Load MNIST Data Start TensorFlow InteractiveSession Build a Softmax Regression Model Placeholders Variables Predicted Class and Cost Function Train the Model Evaluate the Model Build a Multilayer Convolutional Network Weight Initialization Convolution and Pooling First Convolutional Layer Second Convolutional Layer Densely Connected Layer Readout Layer Train and Evaluate the Model Image Recognition Inception-v3 C++ Java ¹ Topics related to the use of GPUs are not available as a part of a remote course. They can be delivered during classroom-based courses, but only by prior agreement, and only if both the trainer and all participants have laptops with supported NVIDIA GPUs, with 64-bit Linux installed (not provided by NobleProg). NobleProg cannot guarantee the availability of trainers with the required hardware.
vespa Vespa: Serving large-scale data in real-time 14 hours Vespa an open-source big data processing and serving engine created by Yahoo.  It is used to respond to user queries, make recommendations, and provide personalized content and advertisements in real-time. This instructor-led, live training introduces the challenges of serving large-scale data and walks participants through the creation of an application that can compute responses to user requests, over large datasets in real-time. By the end of this training, participants will be able to: Use Vespa to quickly compute data (store, search, rank, organize) at serving time while a user waits Implement Vespa into existing applications involving feature search, recommendations, and personalization Integrate and deploy Vespa with existing big data systems such as Hadoop and Storm. Audience Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
t2t T2T: Creating Sequence to Sequence models for generalized learning 7 hours Tensor2Tensor (T2T) is a modular, extensible library for training AI models in different tasks, using different types of training data, for example: image recognition, translation, parsing, image captioning, and speech recognition. It is maintained by the Google Brain team. In this instructor-led, live training, participants will learn how to prepare a deep-learning model to resolve multiple tasks. By the end of this training, participants will be able to: Install tensor2tensor, select a data set, and train and evaluate an AI model Customize a development environment using the tools and components included in Tensor2Tensor Create and use a single model to concurrently learn a number of tasks from multiple domains Use the model to learn from tasks with a large amount of training data and apply that knowledge to tasks where data is limited Obtain satisfactory processing results using a single GPU Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
cassadmin Cassandra Administration 14 hours This course will introduce Cassandra –  a popular NoSQL database.  It will cover Cassandra principles, architecture and data model.   Students will learn data modeling  in CQL (Cassandra Query Language) in hands-on, interactive labs.  This session also discusses Cassandra internals and some admin topics. Section 1: Introduction to Big Data / NoSQL NoSQL overview CAP theorem When is NoSQL appropriate Columnar storage NoSQL ecosystem Section 2 : Cassandra Basics Design and architecture Cassandra nodes, clusters, datacenters Keyspaces, tables, rows and columns Partitioning, replication, tokens Quorum and consistency levels Labs : interacting with cassandra using CQLSH Section 3: Data Modeling – part 1 introduction to CQL CQL Datatypes creating keyspaces & tables Choosing columns and types Choosing primary keys Data layout for rows and columns Time to live (TTL) Querying with CQL CQL updates Collections (list / map / set) Labs : various data modeling exercises using CQL ; experimenting with queries and supported data types Section 4: Data Modeling – part 2 Creating and using secondary indexes composite keys (partition keys and clustering keys) Time series data Best practices for time series data Counters Lightweight transactions (LWT) Labs : creating and using indexes;  modeling time series data Section 5 : Cassandra Internals understand Cassandra design under the hood sstables, memtables, commit log Section 6: Administration Hardware selection Cassandra distributions Cassandra Nodes Communication Writing and Reading data to/from the storage engine Data directories Anti-entropy operations Cassandra Compaction Choosing and Implementing compaction strategies Cassandra best practices (compaction, garbage collection,) troubleshooting tools and tips Lab : students install Cassandra, run benchmarks
cassdev Cassandra for Developers 21 hours This course will introduce Cassandra –  a popular NoSQL database.  It will cover Cassandra principles, architecture and data model.   Students will learn data modeling  in CQL (Cassandra Query Language) in hands-on, interactive labs.  This session also discusses Cassandra internals and some admin topics. Audience : Developers Section 1: Introduction to Big Data / NoSQL NoSQL overview CAP theorem When is NoSQL appropriate Columnar storage NoSQL ecosystem Section 2 : Cassandra Basics Design and architecture Cassandra nodes, clusters, datacenters Keyspaces, tables, rows and columns Partitioning, replication, tokens Quorum and consistency levels Labs : interacting with cassandra using CQLSH Section 3: Data Modeling – part 1 introduction to CQL CQL Datatypes creating keyspaces & tables Choosing columns and types Choosing primary keys Data layout for rows and columns Time to live (TTL) Querying with CQL CQL updates Collections (list / map / set) Labs : various data modeling exercises using CQL ; experimenting with queries and supported data types Section 4: Data Modeling – part 2 Creating and using secondary indexes composite keys (partition keys and clustering keys) Time series data Best practices for time series data Counters Lightweight transactions (LWT) Labs : creating and using indexes;  modeling time series data Section 5 : Data Modeling Labs  : Group design session multiple use cases from various domains are presented students work in groups to come up designs and models discuss various designs, analyze decisions Lab : implement one of the scenario Section 6: Cassandra drivers Introduction to Java driver CRUD (Create / Read / Update, Delete) operations using Java client Asynchronous queries Labs : using Java API for Cassandra Section 7 : Cassandra Internals understand Cassandra design under the hood sstables, memtables, commit log read path / write path caching vnodes Section 8: Administration Hardware selection Cassandra distributions Cassandra best practices (compaction, garbage collection,) troubleshooting tools and tips Lab : students install Cassandra, run benchmarks Section 9:  Bonus Lab (time permitting) Implement a music service like Pandora / Spotify on Cassandra
droolsdslba Drools 6 and DSL for Business Analysts 21 hours This 3 days course is aimed to introduce Drools 6 to Business Analysts responsible for writing tests and rules. This course focuses on creating pure logic. Analysts after this course can writing tests and logic which then can be further integrated by developers with business applications. Short introduction to rule engines Short history or Expert Systems and Rules Engine What is Artificial Intelligence? Forward vs Backward chaining Declarative vs procedure/oop Comparison of solutions When to use rule engines? When not to use rule engines? Alternatives to rule engines KIE Declarative vs Traditional Fact Model Executing simple rules with simple tests Authoring Assets Decision tables Rule Templates Guided rule editor Testing, limits and benefits Developing simple process with rules Writing rules in Eclipse Stateless vs Stateful sessions Selecting proper facts Basic operators and Drools specific operators ) Basic accumulate functions (sum, max, etc...) ​Intermediate calculations Inserting new facts Exercises (lots of them) Ordering rules with BPMN Salience Ruleflow vs BPMN 2.0 Executing ruleset from a process Rules vs gateways Short overview of BPMN 2.0 features (transactions, exception handling) Comprehensive declarative business logic in Drools Domain Specific Languages (DSL) Creating new languages Preparing DSL to be used by manages Basic Natural Language Processing (NLP) with DSL Strategies for writing DSL from rules Strategies for writing rules from DSL written by analysts Unit testing Test strategies (test per case or per rule) Executing test automatically
neuralnet Introduction to the use of neural networks 7 hours The training is aimed at people who want to learn the basics of neural networks and their applications. The Basics Whether computers can think of? Imperative and declarative approach to solving problems Purpose Bedan on artificial intelligence The definition of artificial intelligence. Turing test. Other determinants The development of the concept of intelligent systems Most important achievements and directions of development Neural Networks The Basics Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples XOR problem Lotto? Equities OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed Problems for today Combinatorial explosion and gaming issues Turing test again Over-confidence in the capabilities of computers
rneuralnet Neural Network in R 14 hours This course is an introduction to applying neural networks in real world problems using R-project software. Introduction to Neural Networks What are Neural Networks What is current status in applying neural networks Neural Networks vs regression models Supervised and Unsupervised learning Overview of packages available nnet, neuralnet and others differences between packages and itls limitations Visualizing neural networks Applying Neural Networks Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed
matlab2 MATLAB Fundamentals 21 hours This three-day course provides a comprehensive introduction to the MATLAB technical computing environment. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include: Working with the MATLAB user interface Entering commands and creating variables Analyzing vectors and matrices Visualizing vector and matrix data Working with data files Working with data types Automating commands with scripts Writing programs with logic and flow control Writing functions Part 1 A Brief Introduction to MATLAB Objectives: Offer an overview of what MATLAB is, what it consists of, and what it can do for you An Example: C vs. MATLAB MATLAB Product Overview MATLAB Application Fields What MATLAB can do for you? The Course Outline Working with the MATLAB User Interface Objective: Get an introduction to the main features of the MATLAB integrated design environment and its user interfaces. Get an overview of course themes. MATALB Interface Reading data from file Saving and loading variables Plotting data Customizing plots Calculating statistics and best-fit line Exporting graphics for use in other applications Va​riables and Expressions Objective: Enter MATLAB commands, with an emphasis on creating and accessing data in variables. Entering commands Creating variables Getting help Accessing and modifying values in variables Creating character variables Analysis and Visualization with Vectors Objective: Perform mathematical and statistical calculations with vectors, and create basic visualizations. See how MATLAB syntax enables calculations on whole data sets with a single command. Calculations with vectors Plotting vectors Basic plot options Annotating plots Analysis and Visualization with Matrices Objective: Use matrices as mathematical objects or as collections of (vector) data. Understand the appropriate use of MATLAB syntax to distinguish between these applications. Size and dimensionality Calculations with matrices Statistics with matrix data Plotting multiple columns Reshaping and linear indexing Multidimensional arrays Part 2 Automating Commands with Scripts Objective: Collect MATLAB commands into scripts for ease of reproduction and experimentation. As the complexity of your tasks increases, entering long sequences of commands in the Command Window becomes impractical. A Modelling Example The Command History Creating script files Running scripts Comments and Code Cells Publishing scripts Working with Data Files Objective: Bring data into MATLAB from formatted files. Because imported data can be of a wide variety of types and formats, emphasis is given to working with cell arrays and date formats. Importing data Mixed data types Cell arrays Conversions amongst numerals, strings, and cells Exporting data Multiple Vector Plots Objective: Make more complex vector plots, such as multiple plots, and use color and string manipulation techniques to produce eye-catching visual representations of data. Graphics structure Multiple figures, axes, and plots Plotting equations Using color Customizing plots Logic and Flow Control Objective: Use logical operations, variables, and indexing techniques to create flexible code that can make decisions and adapt to different situations. Explore other programming constructs for repeating sections of code, and constructs that allow interaction with the user. Logical operations and variables Logical indexing Programming constructs Flow control Loops Matrix and Image Visualization Objective: Visualize images and matrix data in two or three dimensions. Explore the difference in displaying images and visualizing matrix data using images. Scattered Interpolation using vector and matrix data 3-D matrix visualization 2-D matrix visualization Indexed images and colormaps True color images Part 3 Data Analysis Objective: Perform typical data analysis tasks in MATLAB, including developing and fitting theoretical models to real-life data. This leads naturally to one of the most powerful features of MATLAB: solving linear systems of equations with a single command. Dealing with missing data Correlation Smoothing Spectral analysis and FFTs Solving linear systems of equations Writing Functions Objective: Increase automation by encapsulating modular tasks as user-defined functions. Understand how MATLAB resolves references to files and variables. Why functions? Creating functions Adding comments Calling subfunctions Workspaces  Subfunctions Path and precedence Data Types Objective: Explore data types, focusing on the syntax for creating variables and accessing array elements, and discuss methods for converting among data types. Data types differ in the kind of data they may contain and the way the data is organized. MATLAB data types Integers Structures Converting types File I/O Objective: Explore the low-level data import and export functions in MATLAB that allow precise control over text and binary file I/O. These functions include textscan, which provides precise control of reading text files. Opening and closing files Reading and writing text files Reading and writing binary files Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Conclusion Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Objectives: Summarise what we have learnt A summary of the course Other upcoming courses on MATLAB Note that the course might be subject to few minor discrepancies when being delivered without prior notifications.
solrdev Solr for Developers 21 hours This course introduces students to the Solr platform. Through a combination of lecture, discussion and labs students will gain hands on experience configuring effective search and indexing. The class begins with basic Solr installation and configuration then teaches the attendees the search features of Solr. Students will gain experience with faceting, indexing and search relevance among other features central to the Solr platform. The course wraps up with a number of advanced topics including spell checking, suggestions, Multicore and SolrCloud. Duration: 3 days Audience: Developers, business users, administrators Overall Goal Provide experienced web developers and technical staff with a comprehensive introduction to the Solr search platform. Teach software developer deep skills creating search solutions. I. Fundamentals Solr Overview Installing and running Solr Adding content to Solr Reading a Solr XML response Changing parameters in the URL Using the browse interface Labs: install Solr, run queries II. Searching Sorting results Query parsers More queries Hardwiring request parameters Adding fields to default search Faceting Result grouping Labs: advanced queries, experiment with faceted search III. Indexing Adding your own content to Solr Deleting data from solr Building a bookstore search Adding book data Exploring the book data Dedupe update processor Labs: indexing various document collections IV. Schema Updating Adding fields to the schema Analyzing text Labs: customize Solr schema V. Relevance Field weighting Phrase queries Function queries Fuzzier search Sounds-like Labs: implementing queries for  relevance VI. Extended features More-like-this Geospatial Spell checking Suggestions Highlighting Pseudo-fields Pseudo-joins Multilanguage Labs: implementing spell checking and suggestions VII. Multicore Adding more kinds of data Labs: creating and administering cores VIII. SolrCloud Introduction How SolrCloud works Commit strategies ZooKeeper Managing Solr config files Labs: administer SolrCloud IX. Developing with Solr API Talking to Solr through REST Configuration Indexing and searching Solr and Spring Labs: code to read and write Solr index, exercise in Spring with Solr X. Developing with Lucene API Building a Lucene index Searching, viewing, debugging Extracting text with Tika Scaling Lucene indices on clusters Lucene performance tuning Labs: coding with Lucene XI. Conclusion Other approaches to search ElasticSearch DataStax Enterprise: Solr+Cassandra Cloudera Solr integration Blur Future directions
kylin Apache Kylin: From classic OLAP to real-time data warehouse 14 hours Apache Kylin is an extreme, distributed analytics engine for big data. In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse. By the end of this training, participants will be able to: Consume real-time streaming data using Kylin Utilize Apache Kylin's powerful features, including snowflake schema support, a rich SQL interface, spark cubing and subsecond query latency Note We use the latest version of Kylin (as of this writing, Apache Kylin v2.0) Audience Big data engineers Big Data analysts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
ApacheIgnite Apache Ignite: Improve speed, scale and availability with in-memory computing 14 hours Apache Ignite is an in-memory computing platform that sits between the application and data layer to improve speed, scale and availability. In this instructor-led, live training, participants will learn the principles behind persistent and pure in-memory storage as they step through the creation of a sample in-memory computing project. By the end of this training, participants will be able to: Use Ignite for in-memory, on-disk persistence as well as a purely distributed in-memory database Achieve persistence without syncing data back to a relational database Use Ignite to carry out SQL and distributed joins Improve performance by moving data closer to the CPU, using RAM as a storage Spread data sets across a cluster to achieve horizontal scalability Integrate Ignite with RDBMS, NoSQL, Hadoop and machine learning processors Audience Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
jupyter Jupyter for Data Science Teams 7 hours Jupyter is an open-source, web-based interactive IDE and computing environment. This instructor-led, live training introduces the idea of collaborative development in data science and demonstrates how to use Jupyter to track and participate as a team in the "life cycle of a computational idea".  It walks participants through the creation of a sample data science project based on top of the Jupyter ecosystem. By the end of this training, participants will be able to: Install and configure Jupyter, including the creation and integration of a team repository on Git Use Jupyter features such as extensions, interactive widgets, multiuser mode and more to enable project collaboraton Create, share and organize Jupyter Notebooks with team members Choose from Scala, Python, R, to write and execute code against big data systems such as Apache Spark, all through the Jupyter interface Audience Data science teams Format of the course Part lecture, part discussion, exercises and heavy hands-on practice   Note The Jupypter Notebook supports over 40 languages including R, Python, Scala, Julia, etc. To customize this course to your language(s) of choice, please contact us to arrange. To request a customized course outline for this training, please contact us.  
fsharpfordatascience F# for Data Science 21 hours Data science is the application of statistical analysis, machine learning, data visualization and programming for the purpose of understanding and interpreting real-world data. F# is a well suited programming language for data science as it combines efficient execution, REPL-scripting, powerful libraries and scalable data integration. In this instructor-led, live training, participants will learn how to use F# to solve a series of real-world data science problems. By the end of this training, participants will be able to: Use F#'s integrated data science packages Use F# to interoperate with other languages and platforms, including Excel, R, Matlab, and Python Use the Deedle package to solve time series problems Carry out advanced analysis with minimal lines of production-quality code Understand how functional programming is a natural fit for scientific and big data computations Access and visualize data with F# Apply F# for machine learning Explore solutions for problems in domains such as business intelligence and social gaming Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
patternmatching Pattern Matching 14 hours Pattern Matching is a technique used to locate specified patterns within an image. It can be used to determine the existence of specified characteristics within a captured image, for example the expected label on a defective product in a factory line or the specified dimensions of a component. It is different from "Pattern Recognition" (which recognizes general patterns based on larger collections of related samples) in that it specifically dictates what we are looking for, then tells us whether the expected pattern exists or not. Audience     Engineers and developers seeking to develop machine vision applications     Manufacturing engineers, technicians and managers Format of the course     This course introduces the approaches, technologies and algorithms used in the field of pattern matching as it applies to Machine Vision. Introduction     Computer Vision     Machine Vision     Pattern Matching vs Pattern Recognition Alignment     Features of the target object     Points of reference on the object     Determining position     Determining orientation Gauging     Setting tolerance levels     Measuring lengths, diameters, angles, and other dimensions     Rejecting a component Inspection     Detecting flaws     Adjusting the system Closing remarks  
drools7dslba Drools 7 and DSL for Business Analysts 21 hours This 3 days course is aimed to introduce Drools 7 to Business Analysts responsible for writing tests and rules. This course focuses on creating pure logic. Analysts after this course can writing tests and logic which then can be further integrated by developers with business applications. Short introduction to rule engines Short history or Expert Systems and Rules Engine What is Artificial Intelligence? Forward vs Backward chaining Declarative vs procedure/oop Comparison of solutions When to use rule engines? When not to use rule engines? Alternatives to rule engines KIE Declarative vs Traditional Fact Model Executing simple rules with simple tests Authoring Assets Decision tables Rule Templates Guided rule editor Testing, limits and benefits Developing simple process with rules Writing rules in Eclipse Stateless vs Stateful sessions Selecting proper facts Basic operators and Drools specific operators ) Basic accumulate functions (sum, max, etc...) ​Intermediate calculations Inserting new facts Exercises (lots of them) Ordering rules with BPMN Salience Ruleflow vs BPMN 2.0 Executing ruleset from a process Rules vs gateways Short overview of BPMN 2.0 features (transactions, exception handling) Comprehensive declarative business logic in Drools Domain Specific Languages (DSL) Creating new languages Preparing DSL to be used by manages Basic Natural Language Processing (NLP) with DSL Strategies for writing DSL from rules Strategies for writing rules from DSL written by analysts Unit testing Test strategies (test per case or per rule) Executing test automatically
bigdatar Programming with Big Data in R 21 hours Introduction to Programming Big Data with R (bpdR) Setting up your environment to use pbdR Scope and tools available in pbdR Packages commonly used with Big Data alongside pbdR Message Passing Interface (MPI) Using pbdR MPI 5 Parallel processing Point-to-point communication Send Matrices Summing Matrices Collective communication Summing Matrices with Reduce Scatter / Gather Other MPI communications Distributed Matrices Creating a distributed diagonal matrix SVD of a distributed matrix Building a distributed matrix in parallel Statistics Applications Monte Carlo Integration Reading Datasets Reading on all processes Broadcasting from one process Reading partitioned data Distributed Regression Distributed Bootstrap
apacheh Administrator Training for Apache Hadoop 35 hours Audience: The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment Goal: Deep knowledge on Hadoop cluster administration. 1: HDFS (17%) Describe the function of HDFS Daemons Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing. Identify current features of computing systems that motivate a system like Apache Hadoop. Classify major goals of HDFS Design Given a scenario, identify appropriate use case for HDFS Federation Identify components and daemon of an HDFS HA-Quorum cluster Analyze the role of HDFS security (Kerberos) Determine the best data serialization choice for a given scenario Describe file read and write paths Identify the commands to manipulate files in the Hadoop File System Shell 2: YARN and MapReduce version 2 (MRv2) (17%) Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons Understand basic design strategy for MapReduce v2 (MRv2) Determine how YARN handles resource allocations Identify the workflow of MapReduce job running on YARN Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN. 3: Hadoop Cluster Planning (16%) Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster. Analyze the choices in selecting an OS Understand kernel tuning and disk swapping Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario 4: Hadoop Cluster Installation and Administration (25%) Given a scenario, identify how the cluster will handle disk and machine failures Analyze a logging configuration and logging configuration file format Understand the basics of Hadoop metrics and cluster health monitoring Identify the function and purpose of available tools for cluster monitoring Be able to install all the ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig Identify the function and purpose of available tools for managing the Apache Hadoop file system 5: Resource Management (10%) Understand the overall design goals of each of Hadoop schedulers Given a scenario, determine how the FIFO Scheduler allocates cluster resources Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN Given a scenario, determine how the Capacity Scheduler allocates cluster resources 6: Monitoring and Logging (15%) Understand the functions and features of Hadoop’s metric collection abilities Analyze the NameNode and JobTracker Web UIs Understand how to monitor cluster Daemons Identify and monitor CPU usage on master nodes Describe how to monitor swap and memory allocation on all nodes Identify how to view and manage Hadoop’s log files Interpret a log file
mlrobot1 Machine Learning for Robotics 21 hours This course introduce machine learning methods in robotics applications. It is a broad overview of existing methods, motivations and main ideas in the context of pattern recognition. After short theoretical background, participants will perform simple exercise using open source (usually R) or any other popular software. Regression Probabilistic Graphical Models Boosting Kernel Methods Gaussian Processes Evaluation and Model Selection Sampling Methods Clustering CRFs Random Forests IVMs
storm Apache Storm 28 hours Apache Storm is a distributed, real-time computation engine used for enabling real-time business intelligence. It does so by enabling applications to reliably process unbounded streams of data (a.k.a. stream processing). "Storm is for real-time processing what Hadoop is for batch processing!" In this instructor-led live training, participants will learn how to install and configure Apache Storm, then develop and deploy an Apache Storm application for processing big data in real-time. Some of the topics included in this training include: Apache Storm in the context of Hadoop Working with unbounded data Continuous computation Real-time analytics Distributed RPC and ETL processing Request this course now! Audience Software and ETL developers Mainframe professionals Data scientists Big data analysts Hadoop professionals Format of the course     Part lecture, part discussion, exercises and heavy hands-on practice Request a customized course outline for this training!
jenetics Jenetics 21 hours Jenetics is an advanced Genetic Algorithm, respectively an Evolutionary Algorithm, library written in modern day Java. Audience This course is directed at Researchers seeking to utilize Jenetics in their projects   Introduction Architecture Base Classes Domain Classes Operation Classes Engine Classes Nuts and Bolts Concurrency Randomness Serialization Utility Classes Extending Jenetics  Genes Chromosomes Selectors Alterers Statistics Engine Advanced Topics Encoding Codec Problem Validation Termination Evolution Performance Internals PRNG Testing Random Seeding Incubation Weasel Program Examples Ones Counting Real Function Rastrigin Function 0/1 knapsack Travelling salesman Evolving Images Build  
snorkel Snorkel: Rapidly process training data 7 hours Snorkel is a system for rapidly creating, modeling, and managing training data. It focuses on accelerating the development of structured or "dark" data extraction applications for domains in which large labeled training sets are not available or easy to obtain. In this instructor-led, live training, participants will learn techniques for extracting value from unstructured data such as text, tables, figures, and images through modeling of training data with Snorkel. By the end of this training, participants will be able to: Programmatically create training sets to enable the labeling of massive training sets Train high-quality end models by first modeling noisy training sets Use Snorkel to implement weak supervision techniques and apply data programming to weakly-supervised machine learning systems Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
fiji Fiji: Introduction to scientific image processing 21 hours Fiji is an open-source image processing package that bundles ImageJ (an image processing program for scientific multidimensional images) and a number of plugins for scientific image analysis. In this instructor-led, live training, participants will learn how to use the Fiji distribution and its underlying ImageJ program to create an image analysis application. By the end of this training, participants will be able to: Use Fiji's advanced programming features and software components to extend ImageJ Stitch large 3d images from overlapping tiles Automatically update a Fiji installation on startup using the integrated update system Select from a broad selection of scripting languages to build custom image analysis solutions Use Fiji's powerful libraries, such as ImgLib on large bioimage datasets Deploy their application and collaborate with other scientists on similar projects Audience Scientists Researchers Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
bpmndmncmmn BPMN, DMN, and CMNN - OMG standards for process improvement 28 hours Business Process Model and Notation (BPMN), Decision Model and Notation (DMN) and Case Management Model and Notation (CMMN) are three Object Management Group (OMG) standards for processes, decisions, and case modelling. This course provides an introduction to all of them and informs when should we use which. Inrtoduction to Standards BPMN, DMN, and CMMN - what are those standards about? When should we use BPMN? When should we use DMN? When should we use CMMN? Business Process Model and Notation (BPMN) Basic BPMN Symbols in Examples Activity Gateways Events Sequence Flow Message Artifacts Modeling Collaboration Pool, Participants Lanes Message Flow How to model messages Activities Activity vs Task Human Interactions Types of Tasks Sub-Process Call Activity Loop Characteristics and Multi-Instance Items and Data Data Modeling Events Concepts Start and End Events Intermediate Events Trigger Types of Events Message Timer Error Escalation Cancel Compensation Link Gateways Sequence Flow Considerations Exclusive Gateway Inclusive Gateway Parallel Gateway Event-Based Gateway Parallel Event-Based Gateway Complex Gateway Decision Model and Notation (DMN) Introduction to DMN Short history Basic concepts Decision requirements Decision log Scope and uses of DMN (human and automated decision making) Decision Requirements DRG DRD Decision Table Simple Expression Language (S-FEEL) FEEL Case Management Model and Notation (CMMN) Case Management Elements Core Infrastructure Case Model Elements Case and Role Information Model Elements Plan Model Elements Artifacts Notation Case Case Plan Models Case File Items Stages Entry and Exit Criterion Plan Fragments Tasks Milestones Event Listeners Links Planning Table Decorators Artifacts
kdd Knowledge Discover in Databases (KDD) 21 hours Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience     Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course     After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations. Introduction     KDD vs data mining Establishing the application domain Establishing relevant prior knowledge Understanding the goal of the investigation Creating a target data set Data cleaning and preprocessing Data reduction and projection Choosing the data mining task Choosing the data mining algorithms Interpreting the mined patterns
drools6int Introduction to Drools 6 for Developers 21 hours This 3 days course is aimed to introduce Drools 6 to developers.This course doesn't cover drools integration, performance or any other complex topics. Short introduction to rule engines Short history or Expert Systems and Rules Engine What is Artificial Intelligence? Forward vs Backward chaining Declarative vs procedure/oop Comparison of solutions When to use rule engines? When not to use rule engines? Alternatives to rule engines KIE Authoring Assets Workbench Integration Executing rules directly from KIE Deployment Decision tables Rule Templates Guided rule editor Testing Work Items Versioning and deployment A bit more about repository (git) Developing simple process with rules Writing rules in Eclipse Stateless vs Stateful sessions Selecting proper facts Basic operators and Drools specific operators ) Basic accumulate functions (sum, max, etc...) ​Intermediate calculations Inserting new facts Exercises (lots of them) Ordering rules with BPMN Salience Ruleflow vs BPMN 2.0 Executing ruleset from a process Rules vs gateways Short overview of BPMN 2.0 features (transactions, exception handling) Comprehensive declarative business logic in Drools Domain Specific Languages (DSL) Creating new languages Preparing DSL to be used by manages Basic Natural Language Processing (NLP) with DSL Fusion (CPE), temporal reasoning (for events to happen after, between, etc...) Fusion operators Example in Event Schedules Unit testing Optional Topics OptaPlanner jBPM Drools and integration via web services Drools integration via command line How to change rules/process after deployment without compiling
psr Introduction to Recommendation Systems 7 hours Audience Marketing department employees, IT strategists and other people involved in decisions related to the design and implementation of recommender systems. Format Short theoretical background follow by analysing working examples and short, simple exercises. Challenges related to data collection Information overload Data types (video, text, structured data, etc...) Potential of the data now and in the near future Basics of Data Mining Recommendation and searching Searching and Filtering Sorting Determining weights of the search results Using Synonyms Full-text search Long Tail Chris Anderson idea Drawbacks of Long Tail Determining Similarities Products Users Documents and web sites Content-Based Recommendation i measurement of similarities Cosine distance The Euclidean distance vectors TFIDF and frequency of terms Collaborative filtering Community rating Graphs Applications of graphs  Determining similarity of graphs Similarity between users Neural Networks Basic concepts of Neural Networks Training Data and Validation Data Neural Network examples in recommender systems How to encourage users to share their data Making systems more comfortable Navigation Functionality and UX Case Studies Popularity of recommender systems and their problems Examples
droolsrlsadm Drools Rules Administration 21 hours This course has been prepared for people who are involved in administering corporate knowledge assets (rules, process) like system administrators, system integrators, application server administrators, etc... We are using the newest stable community version of Drools to run this course, but older versions are also possible if agreed before booking.Drools Administration Short Introduction to Rule Engines Artificial Intelligence Expert Systems What is a Rule Engine? Why use a Rule Engine? Advantages of a Rule Engine When should you use a Rule Engine? Scripting or Process Engines When you should NOT use a Rule Engine Strong and Loose Coupling What are rules? Where things are Managing rules in a jar file Git repository Executing rules from KIE Managing BPMN and workflows files Moving knowledge files (rules, processes, forms, work times...) Rules Testing Where to store test How to execute tests Testing with JUnit Deployment Strategies stand alone application Invoking rules from Java Code integration via files (json, xml, etc...) integration via web services using KIE for integration Administration of rules authoring Packages Artifact Repository Asset Editor Validation Data Model Categories versioning Domain Specific Languages Optimizing hardware and software for rules execution Multithreading and Drools Kie Projects structures Lifecycles Building Deploying Running Installation and Deployment Cheat Sheets Organization Units Users, Rules and Permissions Authentication Repositories Backup and Restore Logging
deeplearning1 Introduction to Deep Learning 21 hours This course is general overview for Deep Learning without going too deep into any specific methods. It is suitable for people who want to start using Deep learning to enhance their accuracy of prediction. Backprop, modular models Logsum module RBF Net MAP/MLE loss Parameter Space Transforms Convolutional Module Gradient-Based Learning  Energy for inference, Objective for learning PCA; NLL:  Latent Variable Models Probabilistic LVM Loss Function Handwriting recognition
glusterfs GlusterFS for System Administrators 21 hours GlusterFS is an open-source distributed file storage system that can scale up to petabytes of capacity. GlusterFS is designed to provide additional space depending on the user's storage requirements. A common application for GlusterFS is cloud computing storage systems. In this instructor-led training, participants will learn how to use normal, off-the-shelf hardware to create and deploy a storage system that is scalable and always available.  By the end of the course, participants will be able to: Install, configure, and maintain a full-scale GlusterFS system. Implement large-scale storage systems in different types of environments. Audience System administrators Storage administrators Format of the Course Part lecture, part discussion, exercises and heavy hands-on practice. Introduction to GlusterFS     Terminologies used Overview of GlusterFS architecture Installing of the GlusterFS Controlling and monitoring the installed GlusterFS Using the Gluster Console Manager Creating the Trusted Storage Pools Understanding of the volume types Creating the GlusterFS client Understanding geo-replication Managing the GLusterFS volume, client, geo-replication and directory quota GlusterFS workload monitoring Accessing the control lists Monitoring the unified file and object storage Monitoring the Hadoop compatible storage Discussing the snapshots GlusterFS troubleshooting Closing Remarks
spmllib Apache Spark MLlib 35 hours MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs. It divides into two packages: spark.mllib contains the original API built on top of RDDs. spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.   Audience This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark spark.mllib: data types, algorithms, and utilities Data types Basic statistics summary statistics correlations stratified sampling hypothesis testing streaming significance testing random data generation Classification and regression linear models (SVMs, logistic regression, linear regression) naive Bayes decision trees ensembles of trees (Random Forests and Gradient-Boosted Trees) isotonic regression Collaborative filtering alternating least squares (ALS) Clustering k-means Gaussian mixture power iteration clustering (PIC) latent Dirichlet allocation (LDA) bisecting k-means streaming k-means Dimensionality reduction singular value decomposition (SVD) principal component analysis (PCA) Feature extraction and transformation Frequent pattern mining FP-growth association rules PrefixSpan Evaluation metrics PMML model export Optimization (developer) stochastic gradient descent limited-memory BFGS (L-BFGS) spark.ml: high-level APIs for ML pipelines Overview: estimators, transformers and pipelines Extracting, transforming and selecting features Classification and regression Clustering Advanced topics
datameer Datameer for Data Analysts 14 hours Datameer is a business intelligence and analytics platform built on Hadoop. It allows end-users to access, explore and correlate large-scale, structured, semi-structured and unstructured data in an easy-to-use fashion. In this instructor-led, live training, participants will learn how to use Datameer to overcome Hadoop's steep learning curve as they step through the setup and analysis of a series of big data sources. By the end of this training, participants will be able to: Create, curate, and interactively explore an enterprise data lake Access business intelligence data warehouses, transactional databases and other analytic stores Use a spreadsheet user-interface to design end-to-end data processing pipelines Access pre-built functions to explore complex data relationships Use drag-and-drop wizards to visualize data and create dashboards Use tables, charts, graphs, and maps to analyze query results Audience Data analysts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
kdbplusandq kdb+ and q: Analyze time series data 21 hours kdb+ is an in-memory, column-oriented database and q is its built-in, interpreted vector-based language. In kdb+, tables are columns of vectors and q is used to perform operations on the table data as if it was a list. kdb+ and q are commonly used in high frequency trading and are popular with the major financial institutions, including Goldman Sachs, Morgan Stanley, Merrill Lynch, JP Morgan, etc. In this instructor-led, live training, participants will learn how to create a time series data application using kdb+ and q. By the end of this training, participants will be able to: Understand the difference between a row-oriented database and a column-oriented database Select data, write scripts and create functions to carry out advanced analytics Analyze time series data such as stock and commodity exchange data Use kdb+'s in-memory capabilities to store, analyze, process and retrieve large data sets at high speed Think of functions and data at a higher level than the standard function(arguments) approach common in non-vector languages Explore other time-sensitive applications for kdb+, including energy trading, telecommunications, sensor data, log data, and machine and network usage monitoring Audience Developers Database engineers Data scientists Data analysts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
mlios Machine Learning on iOS 14 hours In this instructor-led, live training, participants will learn how to use the iOS Machine Learning (ML) technology stack as they as they step through the creation and deployment of an iOS mobile app. By the end of this training, participants will be able to: Create a mobile app capable of image processing, text analysis and speech recognition Access pre-trained ML models for integration into iOS apps Create a custom ML model Add Siri Voice support to iOS apps Understand and use frameworks such as coreML, Vision, CoreGraphics, and GamePlayKit Use languages and tools such as Python, Keras, Caffee, Tensorflow, sci-kit learn, libsvm, Anaconda, and Spyder Audience Developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
scylladb Scylla database 21 hours Scylla is an open-source distributed NoSQL data store. It is compatible with Apache Cassandra but performs at significantly higher throughputs and lower latencies. In this course, participants will learn about Scylla's features and architecture while obtaining practical experience with setting up, administering, monitoring, and troubleshooting Scylla.   Audience     Database administrators     Developers     System Engineers Format of the course     The course is interactive and includes discussions of the principles and approaches for deploying and managing Scylla distributed databases and clusters. The course includes a heavy component of hands-on exercises and practice. Introduction to Scylla Installing and running Scylla Understanding distributed databases Scylla's data model and architecture Working with CQL (Cassandra Query Language) Setting up a Scylla cluster Scylla tools Database administration Troubleshooting Scylla
drools7int Introduction to Drools 7 for Developers 21 hours This 3 days course is aimed to introduce Drools 7 to developers.This course doesn't cover drools integration, performance or any other complex topics. Short introduction to rule engines Short history or Expert Systems and Rules Engine What is Artificial Intelligence? Forward vs Backward chaining Declarative vs procedure/oop Comparison of solutions When to use rule engines? When not to use rule engines? Alternatives to rule engines KIE Authoring Assets Workbench Integration Executing rules directly from KIE Deployment Decision tables Rule Templates Guided rule editor Testing Work Items Versioning and deployment A bit more about repository (git) Developing simple process with rules Writing rules in Eclipse Stateless vs Stateful sessions Selecting proper facts Basic operators and Drools specific operators ) Basic accumulate functions (sum, max, etc...) ​Intermediate calculations Inserting new facts Exercises (lots of them) Ordering rules with BPMN Salience Ruleflow vs BPMN 2.0 Executing ruleset from a process Rules vs gateways Short overview of BPMN 2.0 features (transactions, exception handling) Comprehensive declarative business logic in Drools Domain Specific Languages (DSL) Creating new languages Preparing DSL to be used by manages Basic Natural Language Processing (NLP) with DSL Fusion (CPE), temporal reasoning (for events to happen after, between, etc...) Fusion operators Example in Event Schedules Unit testing Optional Topics OptaPlanner jBPM Drools and integration via web services Drools integration via command line How to change rules/process after deployment without compiling
68736 Hadoop for Developers (2 days) 14 hours Introduction What is Hadoop? What does it do? How does it do it? The Motivation for Hadoop Problems with Traditional Large-Scale Systems Introducing Hadoop Hadoopable Problems Hadoop: Basic Concepts and HDFS The Hadoop Project and Hadoop Components The Hadoop Distributed File System Introduction to MapReduce MapReduce Overview Example: WordCount Mappers Reducers Hadoop Clusters and the Hadoop Ecosystem Hadoop Cluster Overview Hadoop Jobs and Tasks Other Hadoop Ecosystem Components Writing a MapReduce Program in Java Basic MapReduce API Concepts Writing MapReduce Drivers, Mappers, and Reducers in Java Speeding Up Hadoop Development by Using Eclipse Differences Between the Old and New MapReduce APIs Writing a MapReduce Program Using Streaming Writing Mappers and Reducers with the Streaming API Unit Testing MapReduce Programs Unit Testing The JUnit and MRUnit Testing Frameworks Writing Unit Tests with MRUnit Running Unit Tests Delving Deeper into the Hadoop API Using the ToolRunner Class Setting Up and Tearing Down Mappers and Reducers Decreasing the Amount of Intermediate Data with Combiners Accessing HDFS Programmatically Using The Distributed Cache Using the Hadoop API’s Library of Mappers, Reducers, and Partitioners Practical Development Tips and Techniques Strategies for Debugging MapReduce Code Testing MapReduce Code Locally by Using LocalJobRunner Writing and Viewing Log Files Retrieving Job Information with Counters Reusing Objects Creating Map-Only MapReduce Jobs Partitioners and Reducers How Partitioners and Reducers Work Together Determining the Optimal Number of Reducers for a Job Writing Customer Partitioners Data Input and Output Creating Custom Writable and Writable-Comparable Implementations Saving Binary Data Using SequenceFile and Avro Data Files Issues to Consider When Using File Compression Implementing Custom InputFormats and OutputFormats Common MapReduce Algorithms Sorting and Searching Large Data Sets Indexing Data Computing Term Frequency — Inverse Document Frequency Calculating Word Co-Occurrence Performing Secondary Sort Joining Data Sets in MapReduce Jobs Writing a Map-Side Join Writing a Reduce-Side Join Integrating Hadoop into the Enterprise Workflow Integrating Hadoop into an Existing Enterprise Loading Data from an RDBMS into HDFS by Using Sqoop Managing Real-Time Data Using Flume Accessing HDFS from Legacy Systems with FuseDFS and HttpFS An Introduction to Hive, Imapala, and Pig The Motivation for Hive, Impala, and Pig Hive Overview Impala Overview Pig Overview Choosing Between Hive, Impala, and Pig An Introduction to Oozie Introduction to Oozie Creating Oozie Workflows
d2dbdpa From Data to Decision with Big Data and Predictive Analytics 21 hours Audience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing. It is not aimed at people configuring the solution, those people will benefit from the big picture though. Delivery Mode During the course delegates will be presented with working examples of mostly open source technologies. Short lectures will be followed by presentation and simple exercises by the participants Content and Software used All software used is updated each time the course is run so we check the newest versions possible. It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning. Quick Overview Data Sources Minding Data Recommender systems Target Marketing Datatypes Structured vs unstructured Static vs streamed Attitudinal, behavioural and demographic data Data-driven vs user-driven analytics data validity Volume, velocity and variety of data Models Building models Statistical Models Machine learning Data Classification Clustering kGroups, k-means, nearest neighbours Ant colonies, birds flocking Predictive Models Decision trees Support vector machine Naive Bayes classification Neural networks Markov Model Regression Ensemble methods ROI Benefit/Cost ratio Cost of software Cost of development Potential benefits Building Models Data Preparation (MapReduce) Data cleansing Choosing methods Developing model Testing Model Model evaluation Model deployment and integration Overview of Open Source and commercial software Selection of R-project package Python libraries Hadoop and Mahout Selected Apache projects related to Big Data and Analytics Selected commercial solution Integration with existing software and data sources
matlabml1 Introduction to Machine Learning with MATLAB 21 hours MATLAB Basics MATLAB More Advanced Features BP Neural Network RBF, GRNN and PNN Neural Networks SOM Neural Networks Support Vector Machine, SVM Extreme Learning Machine, ELM Decision Trees and Random Forests Genetic Algorithm, GA Particle Swarm Optimization, PSO Ant Colony Algorithm, ACA Simulated Annealing, SA Dimenationality Reduction and Feature Selection
hadoopba Hadoop for Business Analysts 21 hours Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics Audience Business Analysts Duration three days Format Lectures and hands on labs. Section 1: Introduction to Hadoop hadoop history, concepts eco system distributions high level architecture hadoop myths hadoop challenges hardware / software Labs : first look at Hadoop Section 2: HDFS Overview concepts (horizontal scaling, replication, data locality, rack awareness) architecture (Namenode, Secondary namenode, Data node) data integrity future of HDFS : Namenode HA, Federation labs : Interacting with HDFS Section 3 : Map Reduce Overview mapreduce concepts daemons : jobtracker / tasktracker phases : driver, mapper, shuffle/sort, reducer Thinking in map reduce Future of mapreduce (yarn) labs : Running a Map Reduce program Section 4 : Pig pig vs java map reduce pig latin language user defined functions understanding pig job flow basic data analysis with Pig complex data analysis with Pig multi datasets with Pig advanced concepts lab : writing pig scripts to analyze / transform data Section 5: Hive hive concepts architecture SQL support in Hive data types table creation and queries Hive data management partitions & joins text analytics labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions Section 6: BI Tools for Hadoop BI tools and Hadoop Overview of current BI tools landscape Choosing the best tool for the job
aiintrozero From Zero to AI 35 hours This course is created for people who have no previous experience in probability and statistics. Probability (3.5h) Definition of probability Binomial distribution Everyday usage exercises Statistics (10.5h) Descriptive Statistics Inferential Statistics Regression Logistic Regression Exercises Intro to programming (3.5h) Procedural Programming Functional Programming OOP Programming Exercises (writing logic for a game of choice, e.g. noughts and crosses) Machine Learning (10.5h) Classification Clustering Neural Networks Exercises (write AI for a computer game of choice) Rules Engines and Expert Systems (7 hours) Intro to Rule Engines Write AI for the same game and combine solutions into hybrid approach
dataar Data Analytics With R 21 hours R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students.  It covers language fundamentals, libraries and advanced concepts.  Advanced data analytics and graphing with real world data. Audience Developers / data analytics Duration 3 days Format Lectures and Hands-on Day One: Language Basics Course Introduction About Data Science Data Science Definition Process of Doing Data Science. Introducing R Language Variables and Types Control Structures (Loops / Conditionals) R Scalars, Vectors, and Matrices Defining R Vectors Matricies String and Text Manipulation Character data type File IO Lists Functions Introducing Functions Closures lapply/sapply functions DataFrames Labs for all sections Day Two: Intermediate R Programming DataFrames and File I/O Reading data from files Data Preparation Built-in Datasets Visualization Graphics Package plot() / barplot() / hist() / boxplot() / scatter plot Heat Map ggplot2 package ( qplot(), ggplot()) Exploration With Dplyr Labs for all sections Day Three: Advanced Programming With R Statistical Modeling With R Statistical Functions Dealing With NA Distributions (Binomial, Poisson, Normal) Regression Introducing Linear Regressions Recommendations Text Processing (tm package / Wordclouds) Clustering Introduction to Clustering KMeans Classification Introduction to Classification Naive Bayes Decision Trees Training using caret package Evaluating Algorithms R and Big Data Connecting R to databases Big Data Ecosystem Labs for all sections
hypertable Hypertable: Deploy a BigTable like database 14 hours Hypertable was an open-source software database management system based on the design of Google's Bigtable. In this instructor-led, live training, participants will learn how to set up and manage a Hypertable database system. By the end of this training, participants will be able to: Install, configure and upgrade a Hypertable instance Set up and administer a Hypertable cluster Monitor and optimize the performance of the database Design a Hypertable schema Work with Hypertable's API Troubleshoot operational issues Audience Developers Operations engineers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
tensorflowserving TensorFlow Serving 7 hours TensorFlow Serving is a system for serving machine learning (ML) models to production. In this instructor-led, live training, participants will learn how to configure and use TensorFlow Serving to deploy and manage ML models in a production environment. By the end of this training, participants will be able to: Train, export and serve various TensorFlow models Test and deploy algorithms using a single architecture and set of APIs Extend TensorFlow Serving to serve other types of models beyond TensorFlow models Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
marvin Marvin Image Processing Framework - creating image and video processing applications with Marvin 14 hours Marvin is an extensible, cross-platform, open-source image and video processing framework developed in Java.  Developers can use Marvin to manipulate images, extract features from images for classification tasks, generate figures algorithmically, process video file datasets, and set up unit test automation. Some of Marvin's video applications include filtering, augmented reality, object tracking and motion detection. In this course participants will learn the principles of image and video analysis and utilize the Marvin Framework and its image processing algorithms to construct their own application. Audience     Software developers wishing to utilize a rich, plug-in based open-source framework to create image and video processing applications Format of the course     The basic principles of image analysis, video analysis and the Marvin Framework are first introduced. Students are given project-based tasks which allow them to practice the concepts learned. By the end of the class, participants will have developed their own application using the Marvin Framework and libraries. Introduction to Marvin Downloading and installing Marvin Setting up an Eclipse development environment The three layers of the Marvin architecture     Framework     Plug-ins     Applications Components and libraries Image processing in Marvin Video processing in Marvin Multi-threading in Marvin Unit testing in Marvin Working with MarvinEditor Creating an application with Marvin Working with plug-ins Testing the application Video applications     Video filtering     Image subtraction and combination     Tracking     Face features detection     Real time tracking of multiple blobs     Partial shape matching     Skin-colored pixels detection Using Marvin Framework for test automation Extending the framework Contributing to the project Closing remarks
mlbankingr Machine Learning for Banking (with R) 28 hours In this instructor-led, live training, participants will learn how to apply machine learning techniques and tools for solving real-world problems in the banking industry. R will be used as the programming language. Participants first learn the key principles, then put their knowledge into practice by building their own machine learning models and using them to complete live team projects. Introduction Difference between statistical learning (statistical analysis) and machine learning Adoption of machine learning technology by finance and banking companies Different Types of Machine Learning Supervised learning vs unsupervised learning Iteration and evaluation Bias-variance trade-off Combining supervised and unsupervised learning (semi-supervised learning) Machine Learning Languages and Toolsets Open source vs proprietary systems and software R vs Python vs Matlab Libraries and frameworks Machine Learning Case Studies Consumer data and big data Assessing risk in consumer and business lending Improving customer service through sentiment analysis Detecting identity fraud, billing fraud and money laundering Introduction to R Installing the RStudio IDE Loading R packages Data structures Vectors Factors Lists Data Frames Matrixes and Arrays How to Load Machine Learning Data Databases, data warehouses and streaming data Distributed storage and processing with Hadoop and Spark Importing data from a database Importing data from Excel and CSV Modeling Business Decisions with Supervised Learning Classifying your data (classification) Using regression analysis to predict outcome Choosing from available machine learning algorithms Understanding decision tree algorithms Understanding random forest algorithms Model evaluation Exercise Regression Analysis Linear regression Generalizations and Nonlinearity Exercise Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercise Hands-on: Building an Estimation Model Assessing lending risk based on customer type and history Evaluating the performance of Machine Learning Algorithms Cross-validation and resampling Bootstrap aggregation (bagging) Exercise Modeling Business Decisions with Unsupervised Learning K-means clustering Challenges of unsupervised learning Beyond K-means Exercise Hands-on: Building a Recommendation System Analyzing past customer behavior to improve new service offerings Extending your company's capabilities Developing models in the cloud Accelerating machine learning with additional GPUs Beyond machine learning: Artificial Intelligence (AI) Applying Deep Learning neural networks for computer vision, voice recognition and text analysis Closing Remarks
accumulo Apache Accumulo: Building highly scalable big data applications 21 hours Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval. It is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift.   This courses covers the working principles behind Accumulo and walks participants through the development of a sample application on Apache Accumulo. Audience     Application developers     Software engineers     Technical consultants Format of the course     Part lecture, part discussion, hands-on development and implementation, occasional tests to gauge understanding Introduction Installing Accumulo Configuring Accumulo Understanding Accumulo's data model, architecture, and components Working with the shell Database operations Configuring your tables Accumulo iterators Developing an application in Accumulo Securing your application Reading and writing secondary indexes Working with Mapreduce, Spark, and Thrift Proxy Testing your application Troubleshooting Deploying your application Accumulo Administrative tasks
apachemdev Apache Mahout for Developers 14 hours Audience Developers involved in projects that use machine learning with Apache Mahout. Format Hands on introduction to machine learning. The course is delivered in a lab format based on real world practical use cases. Implementing Recommendation Systems with Mahout Introduction to recommender systems Representing recommender data Making recommendation Optimizing recommendation Clustering Basics of clustering Data representation Clustering algorithms Clustering quality improvements Optimizing clustering implementation Application of clustering in real world Classification Basics of classification Classifier training Classifier quality improvements
Fairsec Fairsec: Setting up a CNN-based machine translation system 7 hours Fairseq is an open-source sequence-to-sequence learning toolkit created by Facebok for use in Neural Machine Translation (NMT). In this training participants will learn how to use Fairseq to carry out translation of sample content. By the end of this training, participants will have the knowledge and practice needed to implement a live Fairseq based machine translation solution. Source and target language content samples can be prepared according to audience's requirements. Audience Localization specialists with a technical background Global content managers Localization engineers Software developers in charge of implementing global content solutions Format of the course     Part lecture, part discussion, heavy hands-on practice Introduction     Why Neural Machine Translation? Overview of the Torch project Overview of a Convolutional Neural Machine Translation model     Convolutional Sequence to Sequence Learning     Convolutional Encoder Model for Neural Machine Translation     Standard LSTM-based model Overview of training approaches     About GPUs and CPUs     Fast beam search generation Installation and setup Evaluating pre-trained models Preprocessing your data Training the model Translating Converting a trained model to use CPU-only operations Joining to the community Closing remarks
python_nltk Natural Language Processing with Python 28 hours This course introduces linguists or programmers to NLP in Python. During this course we will mostly use nltk.org (Natural Language Tool Kit), but also we will use other libraries relevant and useful for NLP. At the moment we can conduct this course in Python 2.x or Python 3.x. Examples are in English or Mandarin (普通话). Other languages can be also made available if agreed before booking.Overview of Python packages related to NLP   Introduction to NLP (examples in Python of course) Simple Text Manipulation Searching Text Counting Words Splitting Texts into Words Lexical dispersion Processing complex structures Representing text in Lists Indexing Lists Collocations Bigrams Frequency Distributions Conditionals with Words Comparing Words (startswith, endswith, islower, isalpha, etc...) Natural Language Understanding Word Sense Disambiguation Pronoun Resolution Machine translations (statistical, rule based, literal, etc...) Exercises NLP in Python in examples Accessing Text Corpora and Lexical Resources Common sources for corpora Conditional Frequency Distributions Counting Words by Genre Creating own corpus Pronouncing Dictionary Shoebox and Toolbox Lexicons Senses and Synonyms Hierarchies Lexical Relations: Meronyms, Holonyms Semantic Similarity Processing Raw Text Priting struncating extracting parts of string accessing individual charaters searching, replacing, spliting, joining, indexing, etc... using regular expressions detecting word patterns stemming tokenization normalization of text Word Segmentation (especially in Chinese) Categorizing and Tagging Words Tagged Corpora Tagged Tokens Part-of-Speech Tagset Python Dictionaries Words to Propertieis mapping Automatic Tagging Determining the Category of a Word (Morphological, Syntactic, Semantic) Text Classification (Machine Learning) Supervised Classification Sentence Segmentation Cross Validation Decision Trees Extracting Information from Text Chunking Chinking Tags vs Trees Analyzing Sentence Structure Context Free Grammar Parsers Building Feature Based Grammars Grammatical Features Processing Feature Structures Analyzing the Meaning of Sentences Semantics and Logic Propositional Logic First-Order Logic Discourse Semantics  Managing Linguistic Data  Data Formats (Lexicon vs Text) Metadata
caffe Deep Learning for Vision with Caffe 21 hours Caffe is a deep learning framework made with expression, speed, and modularity in mind. This course explores the application of Caffe as a Deep learning framework for image recognition using MNIST as an example Audience This course is suitable for Deep Learning researchers and engineers interested in utilizing Caffe as a framework. After completing this course, delegates will be able to: understand Caffe’s structure and deployment mechanisms carry out installation / production environment / architecture tasks and configuration assess code quality, perform debugging, monitoring implement advanced production like training models, implementing layers and logging Installation Docker Ubuntu RHEL / CentOS / Fedora installation Windows Caffe Overview Nets, Layers, and Blobs: the anatomy of a Caffe model. Forward / Backward: the essential computations of layered compositional models. Loss: the task to be learned is defined by the loss. Solver: the solver coordinates model optimization. Layer Catalogue: the layer is the fundamental unit of modeling and computation – Caffe’s catalogue includes layers for state-of-the-art models. Interfaces: command line, Python, and MATLAB Caffe. Data: how to caffeinate data for model input. Caffeinated Convolution: how Caffe computes convolutions. New models and new code Detection with Fast R-CNN Sequences with LSTMs and Vision + Language with LRCN Pixelwise prediction with FCNs Framework design and future Examples: MNIST    
datavis1 Data Visualization 28 hours This course is intended for engineers and decision makers working in data mining and knoweldge discovery. You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information. Day 1: what is data visualization why it is important data visualization vs data mining human cognition HMI common pitfalls Day 2: different type of curves drill down curves categorical data plotting multi variable plots data glyph and icon representation Day 3: plotting KPIs with data R and X charts examples what if dashboards parallel axes mixing categorical data with numeric data Day 4: different hats of data visualization how can data visualization lie disguised and hidden trends a case study of student data visual queries and region selection
radvml Advanced Machine Learning with R 21 hours In this instructor-led, live training, participants will learn advanced techniques for Machine Learning with R as they step through the creation of a real-world application. By the end of this training, participants will be able to: Use techniques as hyper-parameter tuning and deep learning Understand and implement unsupervised learning techniques Put a model into production for use in a larger application Audience Developers Analysts Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
dsbda Data Science for Big Data Analytics 35 hours Introduction to Data Science for Big Data Analytics Data Science Overview Big Data Overview Data Structures Drivers and complexities of Big Data Big Data ecosystem and a new approach to analytics Key technologies in Big Data Data Mining process and problems Association Pattern Mining Data Clustering Outlier Detection Data Classification Introduction to Data Analytics lifecycle Discovery Data preparation Model planning Model building Presentation/Communication of results Operationalization Exercise: Case study From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology. Getting started with R Installing R and Rstudio Features of R language Objects in R Data in R Data manipulation Big data issues Exercises Getting started with Hadoop Installing Hadoop Understanding Hadoop modes HDFS MapReduce architecture Hadoop related projects overview Writing programs in Hadoop MapReduce Exercises Integrating R and Hadoop with RHadoop Components of RHadoop Installing RHadoop and connecting with Hadoop The architecture of RHadoop Hadoop streaming with R Data analytics problem solving with RHadoop Exercises Pre-processing and preparing data Data preparation steps Feature extraction Data cleaning Data integration and transformation Data reduction – sampling, feature subset selection, Dimensionality reduction Discretization and binning Exercises and Case study Exploratory data analytic methods in R Descriptive statistics Exploratory data analysis Visualization – preliminary steps Visualizing single variable Examining multiple variables Statistical methods for evaluation Hypothesis testing Exercises and Case study Data Visualizations Basic visualizations in R Packages for data visualization ggplot2, lattice, plotly, lattice Formatting plots in R Advanced graphs Exercises Regression (Estimating future values) Linear regression Use cases Model description Diagnostics Problems with linear regression Shrinkage methods, ridge regression, the lasso Generalizations and nonlinearity Regression splines Local polynomial regression Generalized additive models Regression with RHadoop Exercises and Case study Classification The classification related problems Bayesian refresher Naïve Bayes Logistic regression K-nearest neighbors Decision trees algorithm Neural networks Support vector machines Diagnostics of classifiers Comparison of classification methods Scalable classification algorithms Exercises and Case study Assessing model performance and selection Bias, Variance and model complexity Accuracy vs Interpretability Evaluating classifiers Measures of model/algorithm performance Hold-out method of validation Cross-validation Tuning machine learning algorithms with caret package Visualizing model performance with Profit ROC and Lift curves Ensemble Methods Bagging Random Forests Boosting Gradient boosting Exercises and Case study Support vector machines for classification and regression Maximal Margin classifiers Support vector classifiers Support vector machines SVM’s for classification problems SVM’s for regression problems Exercises and Case study Identifying unknown groupings within a data set Feature Selection for Clustering Representative based algorithms: k-means, k-medoids Hierarchical algorithms: agglomerative and divisive methods Probabilistic base algorithms: EM Density based algorithms: DBSCAN, DENCLUE Cluster validation Advanced clustering concepts Clustering with RHadoop Exercises and Case study Discovering connections with Link Analysis Link analysis concepts Metrics for analyzing networks The Pagerank algorithm Hyperlink-Induced Topic Search Link Prediction Exercises and Case study Association Pattern Mining Frequent Pattern Mining Model Scalability issues in frequent pattern mining Brute Force algorithms Apriori algorithm The FP growth approach Evaluation of Candidate Rules Applications of Association Rules Validation and Testing Diagnostics Association rules with R and Hadoop Exercises and Case study Constructing recommendation engines Understanding recommender systems Data mining techniques used in recommender systems Recommender systems with recommenderlab package Evaluating the recommender systems Recommendations with RHadoop Exercise: Building recommendation engine Text analysis Text analysis steps Collecting raw text Bag of words Term Frequency –Inverse Document Frequency Determining Sentiments Exercises and Case study
Torch Torch: Getting started with Machine and Deep Learning 21 hours Torch is an open source machine learning library and a scientific computing framework based on the Lua programming language. It provides a development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets. It is one of the fastest and most flexible frameworks for Machine and Deep Learning and is used by companies such as Facebook, Google, Twitter, NVIDIA, AMD, Intel, and many others. In this course we cover the principles of Torch, its unique features, and how it can be applied in real-world applications. We step through numerous hands-on exercises all throughout, demonstrating and practicing the concepts learned. By the end of the course, participants will have a thorough understanding of Torch's underlying features and capabilities as well as its role and contribution within the AI space compared to other frameworks and libraries. Participants will have also received the necessary practice to implement Torch in their own projects. Audience     Software developers and programmers wishing to enable Machine and Deep Learning within their applications Format of the course     Overview of Machine and Deep Learning     In-class coding and integration exercises     Test questions sprinkled along the way to check understanding Introduction to Torch     Like NumPy but with CPU and GPU implementation     Torch's usage in machine learning, computer vision, signal processing, parallel processing, image, video, audio and networking Installing Torch     Linux, Windows, Mac     Bitmapi and Docker Installing Torch packages     Using the LuaRocks package manager Choosing an IDE for Torch     ZeroBrane Studio     Eclipse plugin for Lua Working with the Lua scripting language and LuaJIT     Lua's integration with C/C++     Lua syntax: datatypes, loops and conditionals, functions, functions, tables, and file i/o.     Object orientation and serialization in Torch     Coding exercise Loading a dataset in Torch     MNIST     CIFAR-10, CIFAR-100     Imagenet Machine Learning in Torch     Deep Learning         Manual feature extraction vs convolutional networks     Supervised and Unsupervised Learning         Building a neural network with Torch         N-dimensional arrays Image analysis with Torch     Image package     The Tensor library Working with the REPL interpreter Working with databases Networking and Torch GPU support in Torch Integrating Torch     C, Python, and others Embedding Torch     iOS and Android Other frameworks and libraries     Facebook's optimized deep-learning modules and containers Creating your own package Testing and debugging Releasing your application The future of AI and Torch
mlbankingpython_ Machine Learning for Banking (with Python) 21 hours In this instructor-led, live training, participants will learn how to apply machine learning techniques and tools for solving real-world problems in the banking industry. Python will be used as the programming language. Participants first learn the key principles, then put their knowledge into practice by building their own machine learning models and using them to complete live team projects. Introduction Difference between statistical learning (statistical analysis) and machine learning Adoption of machine learning technology and talent by finance and banking companies Different Types of Machine Learning Supervised learning vs unsupervised learning Iteration and evaluation Bias-variance trade-off Combining supervised and unsupervised learning (semi-supervised learning) Machine Learning Languages and Toolsets Open source vs proprietary systems and software Python vs R vs Matlab Libraries and frameworks Machine Learning Case Studies Consumer data and big data Assessing risk in consumer and business lending Improving customer service through sentiment analysis Detecting identity fraud, billing fraud and money laundering Hands-on: Python for Machine Learning Preparing the Development Environment Obtaining Python machine learning libraries and packages Working with scikit-learn and PyBrain How to Load Machine Learning Data Databases, data warehouses and streaming data Distributed storage and processing with Hadoop and Spark Exported data and Excel Modeling Business Decisions with Supervised Learning Classifying your data (classification) Using regression analysis to predict outcome Choosing from available machine learning algorithms Understandind decision tree algorithms Understanding random forest algorithms Model evaluation Exercise Regression Analysis Linear regression Generalizations and Nonlinearity Exercise Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercise Hands-on: Building an Estimation Model Assessing lending risk based on customer type and history Evaluating the performance of Machine Learning Algorithms Cross-validation and resampling Bootstrap aggregation (bagging) Exercise Modeling Business Decisions with Unsupervised Learning K-means clustering Challenges of unsupervised learning Beyond K-means Exercise Hands-on: Building a Recommendation System Analyzing past customer behavior to improve new service offerings Extending your company's capabilities Developing models in the cloud Accelerating machine learning with GPU Beyond machine learning: Artificial Intelligence (AI) Applying Deep Learning neural networks for computer vision, voice recognition and text analysis Closing Remarks
teraintro Teradata Fundamentals 21 hours Teradata is one of the popular Relational Database Management System. It is mainly suitable for building large scale data warehousing applications. Teradata achieves this by the concept of parallelism.  This course introduces the delegates to Teradata Introduction to Teradata Background Why use Teradata User Scalability Relational Concepts Introduction to RDBMS  Warehousing Concepts Set Up and Installation Installation Tools and Utilities like BTEQ Teradata Architecture Components Node Parsing Engine Message Parsing Layer - BYNET Access Module Processor Storage Architecture Retrieval Architecture Architectural Overview Teradata Basic Concepts - SQL Data Type Tables Permanent Volatile Global Temporary Derived Set v/s Multiset Tables Playing with Data - CRUD Operations [DDL and DML] Logical and Conditional Operators SET Operators String Manipulation Date/Time Built in and Aggregate Functions Joins and Subqueries Indexes Primary Secondary Teradata Advanced Concepts Case Coalesce Macros Stored Procedures Space Temp Spool Permanent Join Strategies Statistics Compression Hashing Algorithm OLAP Functions User Management Teradata Additional Concepts Utilities FastLoad MultiLoad FastExport BTEQ Data Protection Methodologies Optimization Strategies Note: The Training would be a mix of theory and handson, and it would be helpful if the delegates actively particpate in the given exercises.
mlintro Introduction to Machine Learning 7 hours This training course is for people that would like to apply basic Machine Learning techniques in practical applications. Audience Data scientists and statisticians that have some familiarity with machine learning and know how to program R. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give a practical introduction to machine learning to participants interested in applying the methods at work Sector specific examples are used to make the training relevant to the audience. Naive Bayes Multinomial models Bayesian categorical data analysis Discriminant analysis Linear regression Logistic regression GLM EM Algorithm Mixed Models Additive Models Classification KNN Ridge regression Clustering
Fairseq Fairseq: Setting up a CNN-based machine translation system 7 hours Fairseq is an open-source sequence-to-sequence learning toolkit created by Facebok for use in Neural Machine Translation (NMT). In this training participants will learn how to use Fairseq to carry out translation of sample content. By the end of this training, participants will have the knowledge and practice needed to implement a live Fairseq based machine translation solution. Source and target language content samples can be prepared according to audience's requirements. Audience Localization specialists with a technical background Global content managers Localization engineers Software developers in charge of implementing global content solutions Format of the course     Part lecture, part discussion, heavy hands-on practice Introduction     Why Neural Machine Translation? Overview of the Torch project Overview of a Convolutional Neural Machine Translation model     Convolutional Sequence to Sequence Learning     Convolutional Encoder Model for Neural Machine Translation     Standard LSTM-based model Overview of training approaches     About GPUs and CPUs     Fast beam search generation Installation and setup Evaluating pre-trained models Preprocessing your data Training the model Translating Converting a trained model to use CPU-only operations Joining to the community Closing remarks
hadoopadm1 Hadoop For Administrators 21 hours Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos. “…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized” — Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising Audience Hadoop administrators Format Lectures and hands-on labs, approximate balance 60% lectures, 40% labs. Introduction Hadoop history, concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Labs: discuss your Big Data projects and problems Planning and installation Selecting software, Hadoop distributions Sizing the cluster, planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure, logs Benchmarking Labs: cluster install, run performance benchmarks HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage, replacing defective drives Labs: getting familiar with HDFS command lines Data ingestion Flume for logs and other data ingestion into HDFS Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL Hadoop data warehousing with Hive Copying data between clusters (distcp) Using S3 as complementary to HDFS Data ingestion best practices and architectures Labs: setting up and using Flume, the same for Sqoop MapReduce operations and administration Parallel computing before mapreduce: compare HPC vs Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through Mapreduce configuration Job config Optimizing MapReduce Fool-proofing MR: what to tell your programmers Labs: running MapReduce examples YARN: new architecture and new capabilities YARN design goals and implementation architecture New actors: ResourceManager, NodeManager, Application Master Installing YARN Job scheduling under YARN Labs: investigate job scheduling Advanced topics Hardware monitoring Cluster monitoring Adding and removing servers, upgrading Hadoop Backup, recovery and business continuity planning Oozie job workflows Hadoop high availability (HA) Hadoop Federation Securing your cluster with Kerberos Labs: set up monitoring Optional tracks Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5) Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
MicrosoftCognitiveToolkit Microsoft Cognitive Toolkit 2.x 21 hours Microsoft Cognitive Toolkit 2.x (previously CNTK) is an open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain. According to Microsoft, CNTK can be 5-10x faster than TensorFlow on recurrent networks, and 2 to 3 times faster than TensorFlow for image-related tasks. In this instructor-led, live training, participants will learn how to use Microsoft Cognitive Toolkit to create, train and evaluate deep learning algorithms for use in commercial-grade AI applications involving multiple types of data such data, speech, text, and images. By the end of this training, participants will be able to: Access CNTK as a library from within a Python, C#, or C++ program Use CNTK as a standalone machine learning tool through its own model description language (BrainScript) Use the CNTK model evaluation functionality from a Java program Combine feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs) Scale computation capacity on CPUs, GPUs and multiple machines Access massive datasets using existing programming languages and algorithms Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Note If you wish to customize any part of this training, including the programming language of choice, please contact us to arrange. To request a customized course outline for this training, please contact us.
genealgo Genetic Algorithms 28 hours This four day course is aimed at teaching how genetic algorithms work; it also covers how to select model parameters of a genetic algorithm; there are many applications for genetic algorithms in this course and optimization problems are tackled with the genetic algorithms. Day 1: What is a genetic algorithm? Chromosome fitness Choosing the random initial population The crossover operations A numeric optimzation example Day 2 When to use genetic algorithm Coding the gene Local maximums and mutation operation Population diversity Day 3 The meaning and effect of each genetic algorithm parameter Varying genetic parameters Optimizing scheduling problems Cross over and mutation for scheduling problems Day 4 Optimizing program or set of rules Cross over and mutation operations for optimizing programs Creating a parallel model of the genetic algorithm Evaluating the genetic algorithm Applications of genetic algorithm
encogintro Encog: Introduction to Machine Learning 14 hours Encog is an open-source machine learning framework for Java and .Net. In this instructor-led, live training, participants will learn how to create various neural network components using ENCOG. Real-world case studies will be discussed and machine language based solutions to these problems will be explored. By the end of this training, participants will be able to: Prepare data for neural networks using the normalization process Implement feed forward networks and propagation training methodologies Implement classification and regression tasks Model and train neural networks using Encog's GUI based workbench Integrate neural network support into real-world applications Audience Developers Analysts Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
iotemi IoT (Internet of Things) for Entrepreneurs, Managers and Investors 21 hours Unlike other technologies, IoT is far more complex encompassing almost every branch of core Engineering-Mechanical, Electronics, Firmware, Middleware, Cloud, Analytics and Mobile. For each of its engineering layers, there are aspects of economics, standards, regulations and evolving state of the art. This is for the firs time, a modest course is offered to cover all of these critical aspects of IoT Engineering. Summary An advanced training program covering the current state of the art in Internet of Things Cuts across multiple technology domains to develop awareness of an IoT system and its components and how it can help businesses and organizations. Live demo of model IoT applications to showcase practical IoT deployments across different industry domains, such as Industrial IoT, Smart Cities, Retail, Travel & Transportation and use cases around connected devices & things Target Audience Managers responsible for business and operational processes within their respective organizations and want to know how to harness IoT to make their systems and processes more efficient. Entrepreneurs and Investors who are looking to build new ventures and want to develop a better understanding of the IoT technology landscape to see how they can leverage it in an effective manner. Estimates for Internet of Things or IoT market value are massive, since by definition the IoT is an integrated and diffused layer of devices, sensors, and computing power that overlays entire consumer, business-to-business, and government industries. The IoT will account for an increasingly huge number of connections: 1.9 billion devices today, and 9 billion by 2018. That year, it will be roughly equal to the number of smartphones, smart TVs, tablets, wearable computers, and PCs combined. In the consumer space, many products and services have already crossed over into the IoT, including kitchen and home appliances, parking, RFID, lighting and heating products, and a number of applications in Industrial Internet. However, the underlying technologies of IoT are nothing new as M2M communication existed since the birth of Internet. However what changed in last couple of years is the emergence of number of inexpensive wireless technologies added by overwhelming adaptation of smart phones and Tablet in every home. Explosive growth of mobile devices led to present demand of IoT. Due to unbounded opportunities in IoT business, a large number of small and medium sized entrepreneurs jumped on a bandwagon of IoT gold rush. Also due to emergence of open source electronics and IoT platform, cost of development of IoT system and further managing its sizable production is increasingly affordable. Existing electronic product owners are experiencing pressure to integrate their device with Internet or Mobile app. This training is intended for a technology and business review of an emerging industry so that IoT enthusiasts/entrepreneurs can grasp the basics of IoT technology and business. Course Objective Main objective of the course is to introduce emerging technological options, platforms and case studies of IoT implementation in home & city automation (smart homes and cities), Industrial Internet, healthcare, Govt., Mobile Cellular and other areas. Basic introduction of all the elements of IoT-Mechanical, Electronics/sensor platform, Wireless and wireline protocols, Mobile to Electronics integration, Mobile to enterprise integration, Data-analytics and Total control plane M2M Wireless protocols for IoT- WiFi, Zigbee/Zwave, Bluetooth, ANT+ : When and where to use which one? Mobile/Desktop/Web app- for registration, data acquisition and control –Available M2M data acquisition platform for IoT-–Xively, Omega and NovoTech, etc. Security issues and security solutions for IoT Open source/commercial electronics platform for IoT-Raspberry Pi, Arduino , ArmMbedLPC etc Open source /commercial enterprise cloud platform for AWS-IoT apps, Azure -IOT, Watson-IOT cloud in addition to other minor IoT clouds Studies of business and technology of some of the common IoT devices like Home automation, Smoke alarm, vehicles, military, home health etc. Session 1 — Business Overview of Why IoT is so important Case Studies from Nest, CISCO and top industries IoT adaptation rate in North American & and how they are aligning their future business model and operation around IoT Broad Scale Application Area Smart House and Smart City Industrial Internet Smart Cars Wearables Home Healthcare Business Rule Generation for IoT 3 layered architecture of Big Data — Physical (Sensors), Communication, and Data Intelligence Session 2 — Introduction of IoT: All about Sensors – Electronics Basic function and architecture of a sensor — sensor body, sensor mechanism, sensor calibration, sensor maintenance, cost and pricing structure, legacy and modern sensor network — all the basics about the sensors Development of sensor electronics — IoT vs legacy, and open source vs traditional PCB design style Development of sensor communication protocols — history to modern days. Legacy protocols like Modbus, relay, HART to modern day Zigbee, Zwave, X10,Bluetooth, ANT, etc. Business driver for sensor deployment — FDA/EPA regulation, fraud/tempering detection, supervision, quality control and process management Different Kind of Calibration Techniques — manual, automation, infield, primary and secondary calibration — and their implication in IoT Powering options for sensors — battery, solar, Witricity, Mobile and PoE Hands on training with single silicon and other sensors like temperature, pressure, vibration, magnetic field, power factor etc. Demo : Logging data from a temperature sensor Session 3 — Fundamental of M2M communication — Sensor Network and Wireless protocol What is a sensor network? What is ad-hoc network? Wireless vs. Wireline network WiFi- 802.11 families: N to S — application of standards and common vendors. Zigbee and Zwave — advantage of low power mesh networking. Long distance Zigbee. Introduction to different Zigbee chips. Bluetooth/BLE: Low power vs high power, speed of detection, class of BLE. Introduction of Bluetooth vendors & their review. Creating network with Wireless protocols such as Piconet by BLE Protocol stacks and packet structure for BLE and Zigbee Other long distance RF communication link LOS vs NLOS links Capacity and throughput calculation Application issues in wireless protocols — power consumption, reliability, PER, QoS, LOS Sensor networks for WAN deployment using LPWAN. Comparison of various emerging protocols such as LoRaWAN, NB-IoT etc. Hands on training with sensor network Demo : Device control using BLE Session 4 — Review of Electronics Platform, production and cost projection PCB vs FPGA vs ASIC design-how to take decision Prototyping electronics vs Production electronics QA certificate for IoT- CE/CSA/UL/IEC/RoHS/IP65: What are those and when needed? Basic introduction of multi-layer PCB design and its workflow Electronics reliability-basic concept of FIT and early mortality rate Environmental and reliability testing-basic concepts Basic Open source platforms: Arduino, Raspberry Pi, Beaglebone, when needed? Session 5 — Conceiving a new IoT product- Product requirement document for IoT State of the present art and review of existing technology in the market place Suggestion for new features and technologies based on market analysis and patent issues Detailed technical specs for new products- System, software, hardware, mechanical, installation etc. Packaging and documentation requirements Servicing and customer support requirements High level design (HLD) for understanding of product concept Release plan for phase wise introduction of the new features Skill set for the development team and proposed project plan -cost & duration Target manufacturing price Session 6 — Introduction to Mobile app platform for IoT Protocol stack of Mobile app for IoT Mobile to server integration –what are the factors to look out What are the intelligent layer that can be introduced at Mobile app level ? iBeacon in IoS Window Azure Amazon AWS-IoT Web Interfaces for Mobile Apps ( REST/WebSockets) IoT Application layer protocols (MQTT/CoAP) Security for IoT middleware- Keys, Token and random password generation for authentication of the gateway devices. Demo : Mobile app for tracking IoT enabled trash cans Session 7 — Machine learning for intelligent IoT Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Support Vector Machine Image and video analytic for IoT Fraud and alert analytic through IoT Bio –metric ID integration with IoT Real Time Analytic/Stream Analytic Scalability issues of IoT and machine learning What are the architectural implementation of Machine learning for IoT Demo : Using KNN Algorithm for regression analysis Demo : SVM based classification for image and video analysis Session 8 — Analytic Engine for IoT Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Recommendation Engine Pattern detection Rule/Scenario discovery — failure, fraud, optimization Root cause discovery Session 9 — Security in IoT implementation Why security is absolutely essential for IoT Mechanism of security breach in IOT layer Privacy enhancing technologies Fundamental of network security Encryption and cryptography implementation for IoT data Security standard for available platform European legislation for security in IoT platform Secure booting Device authentication Firewalling and IPS Updates and patches Session 10 — Database implementation for IoT : Cloud based IoT platforms SQL vs NoSQL-Which one is good for your IoT application Open sourced vs. Licensed Database Available M2M cloud platform Cassandra -Time Series Data Mongo-DB Omega Ayla Libellium CISCO M2M platform AT &T M2M platform Google M2M platform Session 11 — A few common IoT systems Home automation Energy optimization in Home Automotive-OBD IoT-Lock Smart Smoke alarm BAC ( Blood alcohol monitoring ) for drug abusers under probation Pet cam for Pet lovers Wearable IOT Mobile parking ticketing system Indoor location tracking in Retail store Home health care Smart Sports Watch Demo : Smart city application using IoT Demo : Retail, Transportation & Logistics Use case for IoT Session 12 — Big Data for IoT 4V- Volume, velocity, variety and veracity of Big Data Why Big Data is important in IoT Big Data vs legacy data in IoT Hadoop for IoT-when and why? Storage technique for image, Geospatial and video data Distributed database- Cassandra as example Parallel computing basics for IoT Micro services Architecture Demo : Apache Spark
OpenNN OpenNN: Implementing neural networks 14 hours OpenNN is an open-source class library written in C++  which implements neural networks, for use in machine learning. In this course we go over the principles of neural networks and use OpenNN to implement a sample application. Audience     Software developers and programmers wishing to create Deep Learning applications. Format of the course     Lecture and discussion coupled with hands-on exercises. Introduction to OpenNN, Machine Learning and Deep Learning Downloading OpenNN Working with Neural Designer     Using Neural Designer for descriptive, diagnostic, predictive and prescriptive analytics OpenNN architecture     CPU parallelization OpenNN classes     Data set, neural network, loss index, training strategy, model selection, testing analysis     Vector and matrix templates Building a neural network application     Choosing a suitable neural network     Formulating the variational problem (loss index)     Solving the reduced function optimization problem (training strategy) Working with datasets      The data matrix (columns as variables and rows as instances) Learning tasks     Function regression     Pattern recognition Compiling with QT Creator Integrating, testing and debugging your application The future of neural networks and OpenNN
matlabdsandreporting MATLAB Fundamentals, Data Science & Report Generation 126 hours In the first part of this training, we cover the fundamentals of MATLAB and its function as both a language and a platform.  Included in this discussion is an introduction to MATLAB syntax, arrays and matrices, data visualization, script development, and object-oriented principles. In the second part, we demonstrate how to use MATLAB for data mining, machine learning and predictive analytics. To provide participants with a clear and practical perspective of MATLAB's approach and power, we draw comparisons between using MATLAB and using other tools such as spreadsheets, C, C++, and Visual Basic. In the third part of the training, participants learn how to streamline their work by automating their data processing and report generation. Throughout the course, participants will put into practice the ideas learned through hands-on exercises in a lab environment. By the end of the training, participants will have a thorough grasp of MATLAB's capabilities and will be able to employ it for solving real-world data science problems as well as for streamlining their work through automation. Assessments will be conducted throughout the course to gauge progress. Format of the course Course includes theoretical and practical exercises, including case discussions, sample code inspection, and hands-on implementation. Note Practice sessions will be based on pre-arranged sample data report templates. If you have specific requirements, please contact us to arrange. Introduction MATLAB for data science and reporting   Part 01: MATLAB fundamentals Overview     MATLAB for data analysis, visualization, modeling, and programming. Working with the MATLAB user interface Overview of MATLAB syntax Entering commands     Using the command line interface Creating variables     Numeric vs character data Analyzing vectors and matrices     Creating and manipulating     Performing calculations Visualizing vector and matrix data Working with data files     Importing data from Excel spreadsheets Working with data types     Working with table data Automating commands with scripts     Creating and running scripts     Organizing and publishing your scripts Writing programs with branching and loops     User interaction and flow control Writing functions     Creating and calling functions     Debugging with MATLAB Editor Applying object-oriented programming principles to your programs   Part 02: MATLAB for data science Overview     MATLAB for data mining, machine learning and predictive analytics Accessing data     Obtaining data from files, spreadsheets, and databases     Obtaining data from test equipment and hardware     Obtaining data from software and the Web Exploring data     Identifying trends, testing hypotheses, and estimating uncertainty Creating customized algorithms Creating visualizations Creating models Publishing customized reports Sharing analysis tools     As MATLAB code     As standalone desktop or Web applications Using the Statistics and Machine Learning Toolbox Using the Neural Network Toolbox   Part 03: Report generation Overview     Presenting results from MATLAB programs, applications, and sample data     Generating Microsoft Word, PowerPoint®, PDF, and HTML reports.     Templated reports     Tailor-made reports         Using organization’s templates and standards Creating reports interactively vs programmatically     Using the Report Explorer     Using the DOM (Document Object Model) API Creating reports interactively using Report Explorer     Report Explorer Examples         Magic Squares Report Explorer Example     Creating reports         Using Report Explorer to create report setup file, define report structure and content     Formatting reports         Specifying default report style and format for Report Explorer reports     Generating reports         Configuring Report Explorer for processing and running report     Managing report conversion templates         Copying and managing Microsoft Word , PDF, and HTML conversion templates for Report Explorer reports     Customizing Report Conversion templates         Customizing the style and format of Microsoft Word and HTML conversion templates for Report Explorer reports     Customizing components and style sheets         Customizing report components, define layout style sheets Creating reports programmatically in MATLAB     Template-Based Report Object (DOM) API Examples         Functional report         Object-oriented report         Programmatic report formatting     Creating report content         Using the Document Object Model (DOM) API     Report format basics         Specifying format for report content     Creating form-based reports         Using the DOM API to fill in the blanks in a report form     Creating object-oriented reports         Deriving classes to simplify report creation and maintenance     Creating and formatting report objects         Lists, tables, and images     Creating DOM Reports from HTML         Appending HTML string or file to a Microsoft® Word, PDF, or HTML report generated by Document Object Model (DOM) API     Creating report templates         Creating templates to use with programmatic reports     Formatting page layouts         Formatting pages in Microsoft Word and PDF reports Summary and closing remarks
opencv Computer Vision with OpenCV 28 hours OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. Audience This course is directed at engineers and architects seeking to utilize OpenCV for computer vision projects Introduction Setting up OpenCV API concepts Main Modules The Core Functionality(Core Module) Image Processing(Imgproc Module) High Level GUI and Media (highgui module) Image Input and Output (imgcodecs module) Video Input and Output (videoio module) Camera calibration and 3D reconstruction (calib3d module) 2D Features framework (feature2d module) Video analysis (video module) Object Detection (objdetect module) Machine Learning (ml module) Computational photography (photo module) OpenCV Viz Bonus topics GPU-Accelerated Computer Vision (cuda module) OpenCV iOS Bonus topics are not available as a part of a remote course. They can be delivered during classroom-based courses, but only by prior agreement, and only if both the trainer and all participants have laptops with supported NVIDIA GPUs (for the CUDA module) or MacBooks, Apple developer accounts and iOS-based mobile devices (for the iOS topic). NobleProg cannot guarantee the availability of trainers with the required hardware.
appliedml Applied Machine Learning 14 hours This training course is for people that would like to apply Machine Learning in practical applications. Audience This course is for data scientists and statisticians that have some familiarity with statistics and know how to program R (or Python or other chosen language). The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give practical applications to Machine Learning to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience. Naive Bayes Multinomial models Bayesian categorical data analysis Discriminant analysis Linear regression Logistic regression GLM EM Algorithm Mixed Models Additive Models Classification KNN Bayesian Graphical Models Factor Analysis (FA) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Support Vector Machines (SVM) for regression and classification Boosting Ensemble models Neural networks Hidden Markov Models (HMM) Space State Models Clustering
opennmt OpenNMT: Setting up a Neural Machine Translation system 7 hours OpenNMT is a full-featured, open-source (MIT) neural machine translation system that utilizes the Torch mathematical toolkit. In this training participants will learn how to set up and use OpenNMT to carry out translation of various sample data sets. The course starts with an overview of neural networks as they apply to machine translation. Participants will carry out live exercises throughout the course to demonstrate their understanding of the concepts learned and get feedback from the instructor. By the end of this training, participants will have the knowledge and practice needed to implement a live OpenNMT solution. Source and target language samples will be pre-arranged per the audience's requirements. Audience Localization specialists with a technical background Global content managers Localization engineers Software developers in charge of implementing global content solutions Format of the course Part lecture, part discussion, heavy hands-on practice Introduction     Why Neural Machine Translation? Overview of the Torch project Installation and setup Preprocessing your data Training the model Translating Using pre-trained models Working with Lua scripts Using extensions Troubleshooting Joining the community Closing remarks
hadoopadm Hadoop Administration 21 hours The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster administration Introduction to Cloud Computing and Big Data solutions Apache Hadoop evolution: HDFS, MapReduce, YARN Installation and configuration of Hadoop in Pseudo-distributed mode Running MapReduce jobs on Hadoop cluster Hadoop cluster planning, installation and configuration Hadoop ecosystem: Pig, Hive, Sqoop, HBase Big Data future: Impala, Cassandra
TalendDI Talend Open Studio for Data Integration 28 hours Talend Open Studio for Data Integration is an open-source data integration product used to combine, convert and update data in various locations across a business. In this instructor-led, live training, participants will learn how to use the Talend ETL tool to carry out data transformation, data extraction, and connectivity with Hadoop, Hive, and Pig.   By the end of this training, participants will be able to Explain the concepts behind ETL (Extract, Transform, Load) and propagation Define ETL methods and ETL tools to connect with Hadoop Efficiently amass, retrieve, digest, consume, transform and shape big data in accordance to business requirements Audience Business intelligence professionals Project managers Database professionals SQL Developers ETL Developers Solution architects Data architects Data warehousing professionals System administrators and integrators Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
datavisR1 Introduction to Data Visualization with R 28 hours This course is intended for data engineers, decision makers and data analysts and will lead you to create very effective plots using R studio that appeal to decision makers and help them find out hidden information and take the right decisions   Day 1: overview of R programming introduction to data visualization scatter plots and clusters the use of noise and jitters Day 2: other type of 2D and 3D plots histograms heat charts categorical data plotting Day 3: plotting KPIs with data R and X charts examples dashboards parallel axes mixing categorical data with numeric data Day 4: different hats of data visualization disguised and hidden trends case studies saving plots and loading Excel files
encogadv Encog: Advanced Machine Learning 14 hours Encog is an open-source machine learning framework for Java and .Net. In this instructor-led, live training, participants will learn advanced machine learning techniques for building accurate neural network predictive models. By the end of this training, participants will be able to: Implement different neural networks optimization techniques to resolve underfitting and overfitting Understand and choose from a number of neural network architectures Implement supervised feed forward and feedback networks Audience Developers Analysts Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
tidyverse Introduction to Data Visualization with Tidyverse and R 7 hours The Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble. In this instructor-led, live training, participants will learn how to manipulate and visualize data using the tools included in the Tidyverse. By the end of this training, participants will be able to: Perform data analysis and create appealing visualizations Draw useful conclusions from various datasets of sample data Filter, sort and summarize data to answer exploratory questions Turn processed data into informative line plots, bar plots, histograms Import and filter data from diverse data sources, including Excel, CSV, and SPSS files Audience Beginners to the R language Beginners to data analysis and data visualization Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction     Tydyverse vs traditional R plotting Setting up your working environment Preparing the dataset Importing and filtering data Wrangling the data Visualizing the data (graphs, scatter plots) Grouping and summarizing the data Visualizing the data (line plots, bar plots, histograms, boxplots) Working with non-standard data Closing remarks
powerbiforbiandanalytics Power BI for Business Analysts 21 hours Microsoft Power BI is a free Software as a Service (SaaS) suite for analyzing data and sharing insights. Power BI dashboards provide a 360-degree view of the most important metrics in one place, updated in real time, and available on all of their devices. In this instructor-led, live training, participants will learn how to use Microsoft Power Bi to analyze and visualize data using a series of sample data sets. By the end of this training, participants will be able to: Create visually compelling dashboards that provide valuable insights into data Obtain and integrate data from multiple data sources Build and share visualizations with team members Adjust data with Power BI Desktop Audience Business managers Business analystss Data analysts Business Intelligence (BI) and Data Warehouse (DW) teams Report developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice   Introduction Data Visualization Authoring in Power BI Desktop Creating reports Interacting with reports Uploading reports it to the Power BI Service Revising report layouts Publishing to PowerBI.com Sharing and collaborating with team members Data Modeling Aquiring data Modeling data Security Working with DAX Refreshing the source data Securing data Advanced querying and data modeling Data modeling principals Complex DAX patterns Power BI tips and tricks Closing remarks
mlentre Machine Learning Concepts for Entrepreneurs and Managers 21 hours This training course is for people that would like to apply Machine Learning in practical applications for their team.  The training will not dive into technicalities and revolve around basic concepts and business/operational applications of the same. Target Audience Investors and AI entrepreneurs Managers and Engineers whose company is venturing into AI space Business Analysts & Investors Introduction to Neural Networks Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Machine Learning with Python Choice of libraries Add-on tools Machine learning Concepts and Applications Regression Linear regression Generalizations and Nonlinearity Use cases Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Use Cases Cross-validation and Resampling Cross-validation approaches Bootstrap Use Cases Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means Short Introduction to NLP methods word and sentence tokenization text classification sentiment analysis spelling correction information extraction parsing meaning extraction question answering Artificial Intelligence & Deep Learning Technical Overview R v/s Python Caffe v/s Tensor Flow Various Machine Learning Libraries
nlpwithr NLP: Natural Language Processing with R 21 hours It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data. This course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. Data samples are available in various languages per customer requirements. By the end of this training participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance. Audience     Linguists and programmers Format of the course     Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction     NLP and R vs Python Installing and configuring R Studio Installing R packages related to Natural Language Processing (NLP). An overview of R’s text manipulation capabilities Getting started with an NLP project in R Reading and importing data files into R Text manipulation with R Document clustering in R Parts of speech tagging in R Sentence parsing in R Working with regular expressions in R Named-entity recognition in R Topic modeling in R Text classification in R Working with very large data sets Visualizing your results Optimization Integrating R with other languages (Java, Python, etc.) Closing remarks
dataminr Data Mining with R 14 hours Sources of methods Artificial intelligence Machine learning Statistics Sources of data Pre processing of data Data Import/Export Data Exploration and Visualization Dimensionality Reduction Dealing with missing values R Packages Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Frequent Pattern Mining Text Mining Decision Trees Regression Neural Networks Sequence Mining Frequent Pattern Mining Data dredging, data fishing, data snooping
datama Data Mining and Analysis 28 hours Objective: Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results. Data preprocessing Data Cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Statistical inference Probability distributions, Random variables, Central limit theorem Sampling Confidence intervals Statistical Inference Hypothesis testing Multivariate linear regression Specification Subset selection Estimation Validation Prediction Classification methods Logistic regression Linear discriminant analysis K-nearest neighbours Naive Bayes Comparison of Classification methods Neural Networks Fitting neural networks Training neural networks issues Decision trees Regression trees Classification trees Trees Versus Linear Models Bagging, Random Forests, Boosting Bagging Random Forests Boosting Support Vector Machines and Flexible disct Maximal Margin classifier Support vector classifiers Support vector machines 2 and more classes SVM’s Relationship to logistic regression Principal Components Analysis Clustering K-means clustering K-medoids clustering Hierarchical clustering Density based clustering Model Assesment and Selection Bias, Variance and Model complexity In-sample prediction error The Bayesian approach Cross-validation Bootstrap methods
hadoopmapr Hadoop Administration on MapR 28 hours Audience: This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand. Big Data Overview: What is Big Data Why Big Data is gaining popularity Big Data Case Studies Big Data Characteristics Solutions to work on Big Data. Hadoop & Its components: What is Hadoop and what are its components. Hadoop Architecture and its characteristics of Data it can handle /Process. Brief on Hadoop History, companies using it and why they have started using it. Hadoop Frame work & its components- explained in detail. What is HDFS and Reads -Writes to Hadoop Distributed File System. How to Setup Hadoop Cluster in different modes- Stand- alone/Pseudo/Multi Node cluster. (This includes setting up a Hadoop cluster in VirtualBox/KVM/VMware, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster). What is Map Reduce frame work and how it works. Running Map Reduce jobs on Hadoop cluster. Understanding Replication , Mirroring and Rack awareness in context of Hadoop clusters. Hadoop Cluster Planning: How to plan your hadoop cluster. Understanding hardware-software to plan your hadoop cluster. Understanding workloads and planning cluster to avoid failures and perform optimum. What is MapR and why MapR : Overview of MapR and its architecture. Understanding & working of MapR Control System, MapR Volumes , snapshots & Mirrors. Planning a cluster in context of MapR. Comparison of MapR with other distributions and Apache Hadoop. MapR installation and cluster deployment. Cluster Setup & Administration: Managing services, nodes ,snapshots, mirror volumes and remote clusters. Understanding and managing Nodes. Understanding of Hadoop components, Installing Hadoop components alongside MapR Services. Accessing Data on cluster including via NFS Managing services & nodes. Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing and monitoring metrics to monitor performance, configuring and administering MapR security. Understanding and working with M7- Native storage for MapR tables. Cluster configuration and tuning for optimum performance. Cluster upgrade and integration with other setups: Upgrading software version of MapR and types of upgrade. Configuring Mapr cluster to access HDFS cluster. Setting up MapR cluster on Amazon Elastic Mapreduce. All the above topics include Demonstrations and practice sessions for learners to have hands on experience of the technology.
PentahoDI Pentaho Data Integration Fundamentals 21 hours Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations. In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle, maximizing the value of data to the organization. By the end of this training, participants will be able to: Create, preview, and run basic data transformations containing steps and hops Configure and secure the Pentaho Enterprise Repository Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format. Provide results to third-part applications for further processing Audience Data Analyst ETL developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
cognitivecomputing Cognitive Computing: An Introduction for Business Managers 7 hours Cognitive computing refers to systems that encompass machine learning, reasoning, natural language processing, speech recognition and vision (object recognition), human–computer interaction, dialog and narrative generation, to name a few. A cognitive computing system is often comprised of multiple technologies working together to process in-memory ‘hot’ contextual data as well as large sets of ‘cold’ historical data in batch. Examples of such technologies include Kafka, Spark, Elasticsearch, Cassandra and Hadoop. In this instructor-led, live training, participants will learn how Cognitive Computing compliments AI and Big Data and how purpose-built systems can be used to realize human-like behaviors that improve the performance of human-machine interactions in business. By the end of this training, participants will understand: The relationship between cognitive computing and artificial intelligence (AI) The inherently probabilistic nature of cognitive computing and how to use it as a business advantage How to manage cognitive computing systems that behave in unexpected ways Which companies and software systems offer the most compelling cognitive computing solutions Audience Business managers Format of the course Lecture, case discussions and exercises To request a customized course outline for this training, please contact us.  
pythonadvml Python for Advanced Machine Learning 21 hours In this instructor-led, live training, participants will learn the most relevant and cutting-edge machine learning techniques in Python as they build a series of demo applications involving image, music, text, and financial data. By the end of this training, participants will be able to: Implement machine learning algorithms and techniques for solving complex problems Apply deep learning and semi-supervised learning to applications involving image, music, text, and financial data Push Python algorithms to their maximum potential Use libraries and packages such as NumPy and Theano Audience Developers Analysts Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
aitech Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP 21 hours Distribution big data Data mining methods (training single systems + distributed prediction: traditional machine learning algorithms + Mapreduce distributed prediction) Apache Spark MLlib Recommendations and Advertising: Natural language Text clustering, text categorization (labeling), synonyms User profile restore, labeling system Recommended algorithms Insuring the accuracy of "lift" between and within categories How to create closed loops for recommendation algorithms Logical regression, RankingSVM, Feature recognition (deep learning and automatic feature recognition for graphics) Natural language Chinese word segmentation Theme model (text clustering) Text classification Extract keywords Semantic analysis, semantic parser, word2vec (vector to word) RNN long-term memory (TSTM) architecture
druid Druid: Build a fast, real-time data analysis system 21 hours Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo. In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment. Audience     Application developers     Software engineers     Technical consultants     DevOps professionals     Architecture engineers Format of the course     Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction Installing and starting Druid Druid architecture and design Real-time ingestion of event data Sharding and indexing Loading data Querying data Visualizing data Running a distributed cluster Druid + Apache Hive Druid + Apache Kafka Druid + others Troubleshooting Administrative tasks
opennlp OpenNLP for Text Based Machine Learning 14 hours The Apache OpenNLP library is a machine learning based toolkit for processing natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution. In this instructor-led, live training, participants will learn how to create models for processing text based data using OpenNLP. Sample training data as well customized data sets will be used as the basis for the lab exercises. By the end of this training, participants will be able to: Install and configure OpenNLP Download existing models as well as create their own Train the models on various sets of sample data Integrate OpenNLP with existing Java applications Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction to Machine Learning and Natural Language Processing Installing and Configuring OpenNLP Overview of OpenNLP's Library Structure Downloading Existing Models Calling the OpenNLP's APIs Sentence Detection and Tokenization Part-of-Speach (POS) Tagging Phrase Chunking Parsing Name Finding English Coreference Training the Tools Creating a Model from Scratch Extending OpenNLP Closing remarks
pmml Predictive Models with PMML 7 hours The course is created to scientific, developers, analysts or any other people who want to standardize or exchange their models with Predictive Model Markup Language (PMML) file format.Predictive Models Intro to predictive models Predictive models supported by PMML PMML Elements Header Data Dictionary Data Transformations Model Mining Schema Targets Output API Overview of API providers for PMML Executing your model in a cloud
cpb100 Google Cloud Platform Fundamentals: Big Data & Machine Learning 8 hours This one-day instructor-led course introduces participants to the big data capabilities of Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, participants get an overview of the Google Cloud platform and a detailed view of the data processing and machine learning capabilities. This course showcases the ease, flexibility, and power of big data solutions on Google Cloud Platform. This course teaches participants the following skills: Identify the purpose and value of the key Big Data and Machine Learning products in the Google Cloud Platform. Use Cloud SQL and Cloud Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud Platform. Employ BigQuery and Cloud Datalab to carry out interactive data analysis. Train and use a neural network using TensorFlow. Employ ML APIs. Choose between different data processing products on the Google Cloud Platform. This class is intended for the following: Data analysts, Data scientists, Business analysts getting started with Google Cloud Platform. Individuals responsible for designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, querying datasets, visualizing query results and creating reports. Executives and IT decision makers evaluating Google Cloud Platform for use by data scientists. The course includes presentations, demonstrations, and hands-on labs. Module 1: Introducing Google Cloud Platform Google Platform Fundamentals Overview. Google Cloud Platform Data Products and Technology. Usage scenarios. Lab: Sign up for Google Cloud Platform. Module 2: Compute and Storage Fundamentals CPUs on demand (Compute Engine). A global filesystem (Cloud Storage). CloudShell. Lab: Set up a Ingest-Transform-Publish data processing pipeline. Module 3: Data Analytics on the Cloud Stepping-stones to the cloud. Cloud SQL: your SQL database on the cloud. Lab: Importing data into CloudSQL and running queries. Spark on Dataproc. Lab: Machine Learning Recommendations with SparkML. Module 4: Scaling Data Analysis Fast random access. Datalab. BigQuery. Lab: Build machine learning dataset. Machine Learning with TensorFlow. Lab: Train and use neural network. Fully built models for common needs. Lab: Employ ML APIs Module 5: Data Processing Architectures Message-oriented architectures with Pub/Sub. Creating pipelines with Dataflow. Reference architecture for real-time and batch data processing. Module 6: Summary Why GCP? Where to go from here Additional Resources
mlfsas Machine Learning Fundamentals with Scala and Apache Spark 14 hours The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the Scala programming language and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Machine Learning with Python Choice of libraries Add-on tools Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means
hdp Hortonworks Data Platform (HDP) for administrators 21 hours Hortonworks Data Platform is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led live training introduces Hortonworks and walks participants through the deployment of Spark + Hadoop solution. By the end of this training, participants will be able to: Use Hortonworks to reliably run Hadoop at a large scale Unify Hadoop's security, governance, and operations capabilities with Spark's agile analytic workflows. Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project Process different types of data, including structured, unstructured, in-motion, and at-rest. Audience Hadoop administrators Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
datamin Data Mining 21 hours Course can be provided with any tools, including free open-source data mining software and applicationsIntroduction Data mining as the analysis step of the KDD process ("Knowledge Discovery in Databases") Subfield of computer science Discovering patterns in large data sets Sources of methods Artificial intelligence Machine learning Statistics Database systems What is involved? Database and data management aspects Data pre-processing Model and inference considerations Interestingness metrics Complexity considerations Post-processing of discovered structures Visualization Online updating Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Use and applications Able Danger Behavioral analytics Business analytics Cross Industry Standard Process for Data Mining Customer analytics Data mining in agriculture Data mining in meteorology Educational data mining Human genetic clustering Inference attack Java Data Mining Open-source intelligence Path analysis (computing) Reactive business intelligence Data dredging, data fishing, data snooping
datavisualizationreports Data Visualization: Creating Captivating Reports 21 hours In this instructor-led, live training, participants will learn the skills, strategies, tools and approaches for visualizing and reporting data for different audiences. Case studies are also analyzed and discussed to exemplify how data visualization solutions are being applied in the real world to derive meaning out of data and answer crucial questions. By the end of this training, participants will be able to: Write reports with captivating titles, subtitles, and annotations using the most suitable highlighting, alignment, and color schemes for readability and user friendliness. Design charts that fit the audience's information needs and interests Choose the best chart types for a given dataset (beyond pie charts and bar charts) Identify and analyze the most valuable and relevant data quickly and efficiently Select the best file formats to include in reports (graphs, infographics, references, GIFs, etc.) Create effective layouts for displaying time series data, part-to-whole relationships, geographic patterns, and nested data Use effective color-coding to display qualitative and text-based data such as sentiment analysis, timelines, calendars, and diagrams Apply the most suitable tools for the job (Excel, R, Tableau, mapping programs, etc.) Prepare datasets for visualization Audience Data analysts Business managers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction to data visualization Selecting and creating effective reports Data visualization tools and resources Generating and revising your visualizations Closing remarks
IntroToAvro Apache Avro: Data serialization for distributed applications 14 hours This course is intended for Developers Format of the course Lectures, hands-on practice, small tests along the way to gauge understanding Principles of distributed computing Apache Spark Hadoop Principles of data serialization How data object is passed over the network Serialization of objects Serialization approaches Thrift Protocol Buffers Apache Avro data structure size, speed, format characteristics persistent data storage integration with dynamic languages dynamic typing schemas untagged data change management Data serialization and distributed computing Avro as a subproject of Hadoop Java serialization Hadoop serialization Avro serialization Using Avro with Hive (AvroSerDe) Pig (AvroStorage) Porting Existing RPC Frameworks
matlabdl Matlab for Deep Learning 14 hours In this instructor-led, live training, participants will learn how to use Matlab to design, build, and visualize a convolutional neural network for image recognition. By the end of this training, participants will be able to: Build a deep learning model Automate data labeling Work with models from Caffe and TensorFlow-Keras Train data using multiple GPUs, the cloud, or clusters Audience Developers Engineers Domain experts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
mlbankingpython Machine Learning for Banking (with Python) - Bespoke 28 hours In this instructor-led, live training, participants will learn how to apply machine learning techniques and tools for solving real-world problems in the banking industry. Deep learning techniques are covered in the latter part of the course. Python will be used as the programming language. Participants first learn the key principles, then put their knowledge into practice by building their own machine learning models and using them to complete live team projects. Introduction Difference between statistical learning (statistical analysis) and machine learning Adoption of machine learning technology and talent by finance and banking companies Different Types of Machine Learning Supervised learning vs unsupervised learning Iteration and evaluation Bias-variance trade-off Combining supervised and unsupervised learning (semi-supervised learning) Machine Learning Languages and Toolsets Open source vs proprietary systems and software Python vs R vs Matlab Libraries and frameworks Machine Learning Case Studies Consumer data and big data Assessing risk in consumer and business lending Improving customer service through sentiment analysis Detecting identity fraud, billing fraud and money laundering Hands-on: Python for Machine Learning Preparing the Development Environment Obtaining Python machine learning libraries and packages Working with scikit-learn and PyBrain How to Load Machine Learning Data Databases, data warehouses and streaming data Distributed storage and processing with Hadoop and Spark Exported data and Excel Modeling Business Decisions with Supervised Learning Classifying your data (classification) Using regression analysis to predict outcome Choosing from available machine learning algorithms Understanding decision tree algorithms Understanding random forest algorithms Model evaluation Exercise Regression Analysis Linear regression Generalizations and Nonlinearity Exercise Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercise Hands-on: Building an Estimation Model Assessing lending risk based on customer type and history Evaluating the performance of Machine Learning Algorithms Cross-validation and resampling Bootstrap aggregation (bagging) Exercise Modeling Business Decisions with Unsupervised Learning K-means clustering Challenges of unsupervised learning Beyond K-means Exercise Hands-on: Building a Recommendation System Analyzing past customer behavior to improve new service offerings Introduction to Neural Networks and Deep Learning Layers and nodes Convolutional neural networks Recurrent neural networks Multilayer perceptrons Frameworks: Theano, TensorFlow, Keras Exercise Hands-on: Building an AI system Monitoring big data to detect money laundering and billing fraud Extending your company's capabilities Developing models in the cloud Accelerating machine learning with GPU Beyond machine learning: Artificial Intelligence (AI) Applying neural networks for computer vision, voice recognition and text analysis Closing Remarks
mdlmrah Model MapReduce and Apache Hadoop 14 hours The course is intended for IT specialist that works with the distributed processing of large data sets across clusters of computers. Data Mining and Business Intelligence Introduction Area of application Capabilities Basics of data exploration Big data What does Big data stand for? Big data and Data mining MapReduce Model basics Example application Stats Cluster model Hadoop What is Hadoop Installation Configuration Cluster settings Architecture and configuration of Hadoop Distributed File System Console tools DistCp tool MapReduce and Hadoop Streaming Administration and configuration of Hadoop On Demand Alternatives
mlfunpython Machine Learning Fundamentals with Python 14 hours The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the Python programming language and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Machine Learning with Python Choice of libraries Add-on tools Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means
cpde Data Engineering on Google Cloud Platform 32 hours This four-day instructor-led class provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, participants will learn how to design data processing systems, build end-to-end data pipelines, analyze data and carry out machine learning. The course covers structured, unstructured, and streaming data. This course teaches participants the following skills: Design and build data processing systems on Google Cloud Platform Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow Derive business insights from extremely large datasets using Google BigQuery Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML Leverage unstructured data using Spark and ML APIs on Cloud Dataproc Enable instant insights from streaming data This class is intended for experienced developers who are responsible for managing big data transformations including: Extracting, Loading, Transforming, cleaning, and validating data Designing pipelines and architectures for data processing Creating and maintaining machine learning and statistical models Querying datasets, visualizing query results and creating reports The course includes presentations, demonstrations, and hands-on labs. Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform Module 1: Google Cloud Dataproc Overview Creating and managing clusters. Leveraging custom machine types and preemptible worker nodes. Scaling and deleting Clusters. Lab: Creating Hadoop Clusters with Google Cloud Dataproc. Module 2: Running Dataproc Jobs Running Pig and Hive jobs. Separation of storage and compute. Lab: Running Hadoop and Spark Jobs with Dataproc. Lab: Submit and monitor jobs. Module 3: Integrating Dataproc with Google Cloud Platform Customize cluster with initialization actions. BigQuery Support. Lab: Leveraging Google Cloud Platform Services. Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs Google’s Machine Learning APIs. Common ML Use Cases. Invoking ML APIs. Lab: Adding Machine Learning Capabilities to Big Data Analysis. Serverless Data Analysis with Google BigQuery and Cloud Dataflow Module 5: Serverless data analysis with BigQuery What is BigQuery. Queries and Functions. Lab: Writing queries in BigQuery. Loading data into BigQuery. Exporting data from BigQuery. Lab: Loading and exporting data. Nested and repeated fields. Querying multiple tables. Lab: Complex queries. Performance and pricing. Module 6: Serverless, autoscaling data pipelines with Dataflow The Beam programming model. Data pipelines in Beam Python. Data pipelines in Beam Java. Lab: Writing a Dataflow pipeline. Scalable Big Data processing using Beam. Lab: MapReduce in Dataflow. Incorporating additional data. Lab: Side inputs. Handling stream data. GCP Reference architecture. Serverless Machine Learning with TensorFlow on Google Cloud Platform Module 7: Getting started with Machine Learning What is machine learning (ML). Effective ML: concepts, types. ML datasets: generalization. Lab: Explore and create ML datasets. Module 8: Building ML models with Tensorflow Getting started with TensorFlow. Lab: Using tf.learn. TensorFlow graphs and loops + lab. Lab: Using low-level TensorFlow + early stopping. Monitoring ML training. Lab: Charts and graphs of TensorFlow training. Module 9: Scaling ML models with CloudML Why Cloud ML? Packaging up a TensorFlow model. End-to-end training. Lab: Run a ML model locally and on cloud. Module 10: Feature Engineering Creating good features. Transforming inputs. Synthetic features. Preprocessing with Cloud ML. Lab: Feature engineering. Building Resilient Streaming Systems on Google Cloud Platform Module 11: Architecture of streaming analytics pipelines Stream data processing: Challenges. Handling variable data volumes. Dealing with unordered/late data. Lab: Designing streaming pipeline. Module 12: Ingesting Variable Volumes What is Cloud Pub/Sub? How it works: Topics and Subscriptions. Lab: Simulator. Module 13: Implementing streaming pipelines Challenges in stream processing. Handle late data: watermarks, triggers, accumulation. Lab: Stream data processing pipeline for live traffic data. Module 14: Streaming analytics and dashboards Streaming analytics: from data to decisions. Querying streaming data with BigQuery. What is Google Data Studio? Lab: build a real-time dashboard to visualize processed data. Module 15: High throughput and low-latency with Bigtable What is Cloud Spanner? Designing Bigtable schema. Ingesting into Bigtable. Lab: streaming into Bigtable.  
tf101 Deep Learning with TensorFlow 21 hours TensorFlow is a 2nd Generation API of Google's open source software library for Deep Learning. The system is designed to facilitate research in machine learning, and to make it quick and easy to transition from research prototype to production system. Audience This course is intended for engineers seeking to use TensorFlow for their Deep Learning projects After completing this course, delegates will: understand TensorFlow’s structure and deployment mechanisms be able to carry out installation / production environment / architecture tasks and configuration be able to assess code quality, perform debugging, monitoring be able to implement advanced production like training models, building graphs and logging Machine Learning and Recursive Neural Networks (RNN) basics NN and RNN Backprogation Long short-term memory (LSTM) TensorFlow Basics Creation, Initializing, Saving, and Restoring TensorFlow variables Feeding, Reading and Preloading TensorFlow Data How to use TensorFlow infrastructure to train models at scale Visualizing and Evaluating models with TensorBoard TensorFlow Mechanics 101 Prepare the Data Download Inputs and Placeholders Build the Graph Inference Loss Training Train the Model The Graph The Session Train Loop Evaluate the Model Build the Eval Graph Eval Output Advanced Usage Threading and Queues Distributed TensorFlow Writing Documentation and Sharing your Model Customizing Data Readers Using GPUs¹ Manipulating TensorFlow Model Files TensorFlow Serving Introduction Basic Serving Tutorial Advanced Serving Tutorial Serving Inception Model Tutorial ¹ The Advanced Usage topic, “Using GPUs”, is not available as a part of a remote course. This module can be delivered during classroom-based courses, but only by prior agreement, and only if both the trainer and all participants have laptops with supported NVIDIA GPUs, with 64-bit Linux installed (not provided by NobleProg). NobleProg cannot guarantee the availability of trainers with the required hardware.
magellan Magellan: Geospatial Analytics with on Spark 14 hours Magellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics. This instructor-led, live training introduces the concepts and approaches for implementing geospacial analytics and walks participants through the creation of a predictive analysis application using Magellan on Spark. By the end of this training, participants will be able to: Efficiently query, parse and join geospatial datasets at scale Implement geospatial data in business intelligence and predictive analytics applications Use spatial context to extend the capabilities of mobile devices, sensors, logs, and wearables Audience Application developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.  
datavault Data Vault: Building a Scalable Data Warehouse 28 hours Data vault modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all of the time". Its flexible, scalable, consistent and adaptable design encompasses the best aspects of 3rd normal form (3NF) and star schema. In this instructor-led, live training, participants will learn how to build a Data Vault. By the end of this training, participants will be able to: Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI. Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse Develop a consistent and repeatable ETL (Extract, Transform, Load) process Build and deploy highly scalable and repeatable warehouses Audience Data modelers Data warehousing specialist Business Intelligence specialists Data engineers Database administrators Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction     The shortcomings of existing data warehouse data modeling architectures     Benefits of Data Vault modeling Overview of Data Vault architecture and design principles     SEI / CMM / Compliance Data Vault applications     Dynamic Data Warehousing     Exploration Warehousing     In-Database Data Mining     Rapid Linking of External Information Data Vault components     Hubs, Links, Satellites Building a Data Vault Modeling Hubs, Links and Satellites Data Vault reference rules How components interact with each other Modeling and populating a Data Vault Converting 3NF OLTP to a Data Vault Enterprise Data Warehouse (EDW) Understanding load dates, end-dates, and join operations Business keys, relationships, link tables and join techniques Query techniques Load processing and query processing Overview of Matrix Methodology Getting data into data entities Loading Hub Entities Loading Link Entities Loading Satellites Using SEI/CMM Level 5 templates to obtain repeatable, reliable, and quantifiable results Developing a consistent and repeatable ETL (Extract, Transform, Load) process Building and deploying highly scalable and repeatable warehouses Closing remarks  
nlg Python for Natural Language Generation 21 hours Natural language generation (NLG) refers to the production of natural language text or speech by a computer. In this instructor-led, live training, participants will learn how to use Python to produce high-quality natural language text by building their own NLG system from scratch. Case studies will also be examined and discussed to appreciate the real-world uses of NLG for generating content. By the end of this training, participants will be able to: Use NLG to automatically generate content for various industries, from journalism, to real estate, to weather and sports reporting Select and organize source content, plan sentences, and prepare a system for automatic generation of original content Understand the NLG pipeline and apply the right techniques at each stage Understand the architecture of a Natural Language Generation (NLG) system Implement the most suitable algorithms and models for analysis and ordering Pull data from publicly available data sources as well as curated databases to use as material for generated text Replace manual and laborious writing processes with computer-generated, automated content creation Audience Developers Data scientists Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us.
Piwik Getting started with Piwik 21 hours Audience Web analysist Data analysists Market researchers Marketing and sales professionals System administrators Format of course     Part lecture, part discussion, heavy hands-on practice Introduction to Piwik Why use Piwik? Piwik vs Google Analystics Setting up Piwik Selecting which websites to monitor Working with the dashboard Understanding visitor activity Actions Referrals Generating reports  
matlabpredanalytics Matlab for Predictive Analytics 21 hours Predictive analytics is the process of using data analytics to make predictions about the future. This process uses data along with data mining, statistics, and machine learning techniques to create a predictive model for forecasting future events. In this instructor-led, live training, participants will learn how to use Matlab to build predictive models and apply them to large sample data sets to predict future events based on the data. By the end of this training, participants will be able to: Create predictive models to analyze patterns in historical and transactional data Use predictive modeling to identify risks and opportunities Build mathematical models that capture important trends Use data to from devices and business systems to reduce waste, save time, or cut costs Audience Developers Engineers Domain experts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction     Predictive analytics in finance, healthcare, pharmaceuticals, automotive, aerospace, and manufacturing Overview of Big Data concepts Capturing data from disparate sources What are data-driven predictive models? Overview of statistical and machine learning techniques Case study: predictive maintenance and resource planning Applying algorithms to large data sets with Hadoop and Spark Predictive Analytics Workflow Accessing and exploring data Preprocessing the data Developing a predictive model Training, testing and validating a data set Applying different machine learning approaches ( time-series regression, linear regression, etc.) Integrating the model into existing web applications, mobile devices, embedded systems, etc. Matlab and Simulink integration with embedded systems and enterprise IT workflows Creating portable C and C++ code from MATLAB code Deploying predictive applications to large-scale production systems, clusters, and clouds Acting on the results of your analysis Next steps: Automatically responding to findings using Prescriptive Analytics Closing remarks
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance 35 hours This course provides a comprehensive introduction to the MATLAB technical computing environment + an introduction to using MATLAB for financial applications. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include: Working with the MATLAB user interface Entering commands and creating variables Analyzing vectors and matrices Visualizing vector and matrix data Working with data files Working with data types Automating commands with scripts Writing programs with logic and flow control Writing functions Using the Financial Toolbox for quantitative analysis Part 1 A Brief Introduction to MATLAB Objectives: Offer an overview of what MATLAB is, what it consists of, and what it can do for you An Example: C vs. MATLAB MATLAB Product Overview MATLAB Application Fields What MATLAB can do for you? The Course Outline Working with the MATLAB User Interface Objective: Get an introduction to the main features of the MATLAB integrated design environment and its user interfaces. Get an overview of course themes. MATALB Interface Reading data from file Saving and loading variables Plotting data Customizing plots Calculating statistics and best-fit line Exporting graphics for use in other applications Variables and Expressions Objective: Enter MATLAB commands, with an emphasis on creating and accessing data in variables. Entering commands Creating variables Getting help Accessing and modifying values in variables Creating character variables Analysis and Visualization with Vectors Objective: Perform mathematical and statistical calculations with vectors, and create basic visualizations. See how MATLAB syntax enables calculations on whole data sets with a single command. Calculations with vectors Plotting vectors Basic plot options Annotating plots Analysis and Visualization with Matrices Objective: Use matrices as mathematical objects or as collections of (vector) data. Understand the appropriate use of MATLAB syntax to distinguish between these applications. Size and dimensionality Calculations with matrices Statistics with matrix data Plotting multiple columns Reshaping and linear indexing Multidimensional arrays Part 2 Automating Commands with Scripts Objective: Collect MATLAB commands into scripts for ease of reproduction and experimentation. As the complexity of your tasks increases, entering long sequences of commands in the Command Window becomes impractical. A Modelling Example The Command History Creating script files Running scripts Comments and Code Cells Publishing scripts Working with Data Files Objective: Bring data into MATLAB from formatted files. Because imported data can be of a wide variety of types and formats, emphasis is given to working with cell arrays and date formats. Importing data Mixed data types Cell arrays Conversions amongst numerals, strings, and cells Exporting data Multiple Vector Plots Objective: Make more complex vector plots, such as multiple plots, and use color and string manipulation techniques to produce eye-catching visual representations of data. Graphics structure Multiple figures, axes, and plots Plotting equations Using color Customizing plots Logic and Flow Control Objective: Use logical operations, variables, and indexing techniques to create flexible code that can make decisions and adapt to different situations. Explore other programming constructs for repeating sections of code, and constructs that allow interaction with the user. Logical operations and variables Logical indexing Programming constructs Flow control Loops Matrix and Image Visualization Objective: Visualize images and matrix data in two or three dimensions. Explore the difference in displaying images and visualizing matrix data using images. Scattered Interpolation using vector and matrix data 3-D matrix visualization 2-D matrix visualization Indexed images and colormaps True color images Part 3 Data Analysis Objective: Perform typical data analysis tasks in MATLAB, including developing and fitting theoretical models to real-life data. This leads naturally to one of the most powerful features of MATLAB: solving linear systems of equations with a single command. Dealing with missing data Correlation Smoothing Spectral analysis and FFTs Solving linear systems of equations Writing Functions Objective: Increase automation by encapsulating modular tasks as user-defined functions. Understand how MATLAB resolves references to files and variables. Why functions? Creating functions Adding comments Calling subfunctions Workspaces  Subfunctions Path and precedence Data Types Objective: Explore data types, focusing on the syntax for creating variables and accessing array elements, and discuss methods for converting among data types. Data types differ in the kind of data they may contain and the way the data is organized. MATLAB data types Integers Structures Converting types File I/O Objective: Explore the low-level data import and export functions in MATLAB that allow precise control over text and binary file I/O. These functions include textscan, which provides precise control of reading text files. Opening and closing files Reading and writing text files Reading and writing binary files Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Part 4 Overview of the MATLAB Financial Toolbox Objective: Learn to apply the various features included in the MATLAB Financial Toolbox to perform quantitative analysis for the financial industry. Gain the knowledge and practice needed to efficiently develop real-world applications involving financial data. Asset Allocation and Portfolio Optimization Risk Analysis and Investment Performance Fixed-Income Analysis and Option Pricing Financial Time Series Analysis Regression and Estimation with Missing Data Technical Indicators and Financial Charts Monte Carlo Simulation of SDE Models Asset Allocation and Portfolio Optimization Objective: perform capital allocation, asset allocation, and risk assessment. Estimating asset return and total return moments from price or return data Computing portfolio-level statistics, such as mean, variance, value at risk (VaR), and conditional value at risk (CVaR) Performing constrained mean-variance portfolio optimization and analysis Examining the time evolution of efficient portfolio allocations Performing capital allocation Accounting for turnover and transaction costs in portfolio optimization problems Risk Analysis and Investment Performance Objective: Define and solve portfolio optimization problems. Specifying a portfolio name, the number of assets in an asset universe, and asset identifiers. Defining an initial portfolio allocation. Fixed-Income Analysis and Option Pricing Objective: Perform fixed-income analysis and option pricing. Analyzing cash flow Performing SIA-Compliant fixed-income security analysis Performing basic Black-Scholes, Black, and binomial option-pricing Part 5 Financial Time Series Analysis Objective: analyze time series data in financial markets. Performing data math Transforming and analyzing data Technical analysis Charting and graphics Regression and Estimation with Missing Data Objective: Perform multivariate normal regression with or without missing data. Performing common regressions Estimating log-likelihood function and standard errors for hypothesis testing Completing calculations when data is missing Technical Indicators and Financial Charts Objective: Practice using performance metrics and specialized plots. Moving averages Oscillators, stochastics, indexes, and indicators Maximum drawdown and expected maximum drawdown Charts, including Bollinger bands, candlestick plots, and moving averages Monte Carlo Simulation of SDE Models Objective: Create simulations and apply SDE models Brownian Motion (BM) Geometric Brownian Motion (GBM) Constant Elasticity of Variance (CEV) Cox-Ingersoll-Ross (CIR) Hull-White/Vasicek (HWV) Heston Conclusion Objectives: Summarise what we have learned A summary of the course Other upcoming courses on MATLAB Note: the actual content delivered might differ from the outline as a result of customer requirements and the time spent on each topic.
bldrools Managing Business Logic with Drools 21 hours This course is aimed at enterprise architects, business and system analysts, technical managers and developers who want to apply business rules to their solutions. This course contains a lot of simple hands-on exercises during which the participants will create working rules. Please refer to our other courses if you just need an overview of Drools. This course is usually delivered on the newest stable version of Drools and jBPM, but in case of a bespoke course, can be tailored to a specific version. Short Introduction to Rule Engines Artificial Intelligence  Expert Systems What is a Rule Engine? Why use a Rule Engine? Advantages of a Rule Engine When should you use a Rule Engine? Scripting or Process Engines When you should NOT use a Rule Engine Strong and Loose Coupling What are rules? Creating and Implementing Rules Fact Model KIE Rules visioning and repository Exercises Domain Specific Language (DSL) Replacing rules with DSL Testing DSL rules Exercises jBPM Integration with Drools Short overview of basic BPMN Invoking rules from a processes Grouping rules Exercises Fusion What is Complex Event Processing? Short overview on Fusion Exercises Mvel - the rule language Filtering (fact type, field Operators Compound conditions Operators priority Accumulate Functions (average, min, max, sum, collectList, etc....) Rete - under the hood Compilation algorithm Drools RETE extensions Node Types Understating Rete Tree Rete Optimization Rules Testing Testing with KIE Testing with JUnit OptaPlanner An overview of OptaPlanner Simple examples Integrating Rules with Applications Invoking rules from Java Code

Upco...Upcoming Courses

CourseCourse DateCourse Price [Remote / Classroom]
Artificial Intelligence Overview - DubaiThu, 2017-12-21 09:301575USD / 2625USD

Other regions

Weekend Artificial Intelligence courses, Evening Artificial Intelligence training, Artificial Intelligence boot camp, Artificial Intelligence instructor-led , Weekend Artificial Intelligence training, Artificial Intelligence on-site, Artificial Intelligence classes, Artificial Intelligence private courses, Artificial Intelligence coaching, Artificial Intelligence instructor, Artificial Intelligence one on one training , Evening Artificial Intelligence courses, Artificial Intelligence trainer

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients