Course Outline

Introduction to Big Data Analytics in Health

Overview of Big Data Analytics Technologies

  • Apache Hadoop MapReduce
  • Apache Spark

Installing and Configuring Apache Hadoop MapReduce

Installing and Configuring Apache Spark

Using Predictive Modeling for Health Data

Using Apache Hadoop MapReduce for Health Data

Performing Phenotyping & Clustering on Health Data

  • Classification Evaluation Metrics
  • Classification Ensemble Methods

Using Apache Spark for Health Data

Working with Medical Ontology

Using Graph Analysis on Health Data

Dimensionality Reduction on Health Data

Working with Patient Similarity Metrics

Troubleshooting

Summary and Conclusion

Requirements

  • An understanding of machine learning and data mining concepts
  • Advanced programming experience (Python, Java, Scala)
  • Proficiency in data and ETL processes
  21 Hours
 

Testimonials

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

 21 hours

This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and

Apache Spark MLlib

 35 hours

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative

Alluxio: Unifying Disparate Storage Systems

 7 hours

Alluxio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba. In this instructor-led,

Apache Ambari: Efficiently Manage Hadoop Clusters

 21 hours

Apache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. In this instructor-led live training participants will learn the management tools and practices provided by Ambari to

Introduction to Graph Computing

 28 hours

Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set

Hortonworks Data Platform (HDP) for Administrators

 21 hours

Hortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem. This instructor-led, live training (online or onsite) introduces

Data Analysis with Hive/HiveQL

 7 hours

This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive

Impala for Business Intelligence

 21 hours

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters. Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache

A Practical Introduction to Stream Processing

 21 hours

Stream Processing refers to the real-time processing of "data in motion", that is, performing computations on data as it is being received. Such data is read as continuous streams from data sources such as sensor events, website user

Magellan: Geospatial Analytics on Spark

 14 hours

Magellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics. This instructor-led, live

Apache Spark in the Cloud

 21 hours

Apache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the

Spark for Developers

 21 hours

OBJECTIVE: This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.  The course covers Spark shell for interactive data analysis, Spark

Apache Spark SQL

 7 hours

Spark SQL is Apache Spark's module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform

Python and Spark for Big Data (PySpark)

 21 hours

Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python. In this