Course Outline

I. Introduction and preliminaries

1. Overview

  • Making R more friendly, R and available GUIs
  • Rstudio
  • Related software and documentation
  • R and statistics
  • Using R interactively
  • An introductory session
  • Getting help with functions and features
  • R commands, case sensitivity, etc.
  • Recall and correction of previous commands
  • Executing commands from or diverting output to a file
  • Data permanency and removing objects
  • Good programming practice:  Self-contained scripts, good    readability e.g. structured scripts, documentation, markdown
  • installing packages; CRAN and Bioconductor

2. Reading data

  • Txt files  (read.delim)
  • CSV files

3. Simple manipulations; numbers and vectors  + arrays

  • Vectors and assignment
  • Vector arithmetic
  • Generating regular sequences
  • Logical vectors
  • Missing values
  • Character vectors
  • Index vectors; selecting and modifying subsets of a data set
    • Arrays
  • Array indexing. Subsections of an array
  • Index matrices
  • The array() function + simple operations on arrays e.g. multiplication, transposition  
  • Other types of objects

4. Lists and data frames

  • Lists
  • Constructing and modifying lists
    • Concatenating lists
  • Data frames
    • Making data frames
    • Working with data frames
    • Attaching arbitrary lists
    • Managing the search path

5. Data manipulation

  • Selecting, subsetting observations and variables         
  • Filtering, grouping
  • Recoding, transformations
  • Aggregation, combining data sets
  • Forming partitioned matrices, cbind() and rbind()
  • The concatenation function, (), with arrays
  • Character manipulation, stringr package
  • short intro into grep and regexpr

6. More on Reading data                                            

  • XLS, XLSX files
  • readr  and readxl packages
  • SPSS, SAS, Stata,… and other formats data
  • Exporting data to txt, csv and other formats

6. Grouping, loops and conditional execution

  • Grouped expressions
  • Control statements
  • Conditional execution: if statements
  • Repetitive execution: for loops, repeat and while
  • intro into apply, lapply, sapply, tapply

7. Functions

  • Creating functions
  • Optional arguments and default values
  • Variable number of arguments
  • Scope and its consequences

8. Simple graphics in R

  • Creating a Graph
  • Density Plots
  • Dot Plots
  • Bar Plots
  • Line Charts
  • Pie Charts
  • Boxplots
  • Scatter Plots
  • Combining Plots

II. Statistical analysis in R 

1.    Probability distributions

  • R as a set of statistical tables
  • Examining the distribution of a set of data

2.   Testing of Hypotheses

  • Tests about a Population Mean
  • Likelihood Ratio Test
  • One- and two-sample tests
  • Chi-Square Goodness-of-Fit Test
  • Kolmogorov-Smirnov One-Sample Statistic 
  • Wilcoxon Signed-Rank Test
  • Two-Sample Test
  • Wilcoxon Rank Sum Test
  • Mann-Whitney Test
  • Kolmogorov-Smirnov Test

3. Multiple Testing of Hypotheses

  • Type I Error and FDR
  • ROC curves and AUC
  • Multiple Testing Procedures (BH, Bonferroni etc.)

4. Linear regression models

  • Generic functions for extracting model information
  • Updating fitted models
  • Generalized linear models
    • Families
    • The glm() function
  • Classification
    • Logistic Regression
    • Linear Discriminant Analysis
  • Unsupervised learning
    • Principal Components Analysis
    • Clustering Methods(k-means, hierarchical clustering, k-medoids)

5.  Survival analysis (survival package)

  • Survival objects in r
  • Kaplan-Meier estimate, log-rank test, parametric regression
  • Confidence bands
  • Censored (interval censored) data analysis
  • Cox PH models, constant covariates
  • Cox PH models, time-dependent covariates
  • Simulation: Model comparison (Comparing regression models)

 6.   Analysis of Variance

  • One-Way ANOVA
  • Two-Way Classification of ANOVA
  • MANOVA

III. Worked problems in bioinformatics           

  • Short introduction to limma package
  • Microarray data analysis workflow
  • Data download from GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
  • Data processing (QC, normalisation, differential expression)
  • Volcano plot             
  • Custering examples + heatmaps
  28 Hours
 

Testimonials

Related Courses

From Data to Decision with Big Data and Predictive Analytics

 21 hours

Audience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to

Data Mining and Analysis

 28 hours

Objective: Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive

Data Mining

 21 hours

Course can be provided with any tools, including free open-source data mining software and applications

Data Mining with R

 14 hours

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data

MonetDB

 28 hours

MonetDB is an open-source database that pioneered the column-store technology approach. In this instructor-led, live training, participants will learn how to use MonetDB and how to get the most value out of it. By the end of this training,

Oracle SQL Intermediate - Data Extraction

 14 hours

The objective of the course is to enable participants to gain a mastery of how to work with the SQL language in Oracle database for data extraction at intermediate level.

Statistics with SPSS Predictive Analytics Software

 14 hours

Goal: Learning to work with SPSS at the level of independence The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining

Data Vault: Building a Scalable Data Warehouse

 28 hours

Data Vault Modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all the time". Its

Data Visualization

 28 hours

This course is intended for engineers and decision makers working in data mining and knoweldge discovery. You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers

Foundation R

 7 hours

The objective of the course is to enable participants to gain a mastery of the fundamentals of R and how to work with data.

Data Mining & Machine Learning with R

 14 hours

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data

Data Science for Big Data Analytics

 35 hours

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer,

Knowledge Discovery in Databases (KDD)

 21 hours

Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In

Process Mining

 21 hours

Process mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with

Introduction to Data Visualization with Tidyverse and R

 7 hours

The Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble. In this instructor-led, live training,