Course Outline
I. Introduction and preliminaries
1. Overview
- Making R more friendly, R and available GUIs
- Rstudio
- Related software and documentation
- R and statistics
- Using R interactively
- An introductory session
- Getting help with functions and features
- R commands, case sensitivity, etc.
- Recall and correction of previous commands
- Executing commands from or diverting output to a file
- Data permanency and removing objects
- Good programming practice: Self-contained scripts, good readability e.g. structured scripts, documentation, markdown
- installing packages; CRAN and Bioconductor
2. Reading data
- Txt files (read.delim)
- CSV files
3. Simple manipulations; numbers and vectors + arrays
- Vectors and assignment
- Vector arithmetic
- Generating regular sequences
- Logical vectors
- Missing values
- Character vectors
- Index vectors; selecting and modifying subsets of a data set
- Arrays
- Array indexing. Subsections of an array
- Index matrices
- The array() function + simple operations on arrays e.g. multiplication, transposition
- Other types of objects
4. Lists and data frames
- Lists
- Constructing and modifying lists
- Concatenating lists
- Data frames
- Making data frames
- Working with data frames
- Attaching arbitrary lists
- Managing the search path
5. Data manipulation
- Selecting, subsetting observations and variables
- Filtering, grouping
- Recoding, transformations
- Aggregation, combining data sets
- Forming partitioned matrices, cbind() and rbind()
- The concatenation function, (), with arrays
- Character manipulation, stringr package
- short intro into grep and regexpr
6. More on Reading data
- XLS, XLSX files
- readr and readxl packages
- SPSS, SAS, Stata,… and other formats data
- Exporting data to txt, csv and other formats
6. Grouping, loops and conditional execution
- Grouped expressions
- Control statements
- Conditional execution: if statements
- Repetitive execution: for loops, repeat and while
- intro into apply, lapply, sapply, tapply
7. Functions
- Creating functions
- Optional arguments and default values
- Variable number of arguments
- Scope and its consequences
8. Simple graphics in R
- Creating a Graph
- Density Plots
- Dot Plots
- Bar Plots
- Line Charts
- Pie Charts
- Boxplots
- Scatter Plots
- Combining Plots
II. Statistical analysis in R
1. Probability distributions
- R as a set of statistical tables
- Examining the distribution of a set of data
2. Testing of Hypotheses
- Tests about a Population Mean
- Likelihood Ratio Test
- One- and two-sample tests
- Chi-Square Goodness-of-Fit Test
- Kolmogorov-Smirnov One-Sample Statistic
- Wilcoxon Signed-Rank Test
- Two-Sample Test
- Wilcoxon Rank Sum Test
- Mann-Whitney Test
- Kolmogorov-Smirnov Test
3. Multiple Testing of Hypotheses
- Type I Error and FDR
- ROC curves and AUC
- Multiple Testing Procedures (BH, Bonferroni etc.)
4. Linear regression models
- Generic functions for extracting model information
- Updating fitted models
- Generalized linear models
- Families
- The glm() function
- Classification
- Logistic Regression
- Linear Discriminant Analysis
- Unsupervised learning
- Principal Components Analysis
- Clustering Methods(k-means, hierarchical clustering, k-medoids)
5. Survival analysis (survival package)
- Survival objects in r
- Kaplan-Meier estimate, log-rank test, parametric regression
- Confidence bands
- Censored (interval censored) data analysis
- Cox PH models, constant covariates
- Cox PH models, time-dependent covariates
- Simulation: Model comparison (Comparing regression models)
6. Analysis of Variance
- One-Way ANOVA
- Two-Way Classification of ANOVA
- MANOVA
III. Worked problems in bioinformatics
- Short introduction to limma package
- Microarray data analysis workflow
- Data download from GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
- Data processing (QC, normalisation, differential expression)
- Volcano plot
- Custering examples + heatmaps
Testimonials
He was very informative and helpful.
Pratheep Ravy
I get answers on all my questions.
Natalia Gladii
The trainer was so knowledgeable and included areas I was interested in.
Mohamed Salama
Very tailored to needs.
Yashan Wang
I genuinely enjoyed working 1:1 with Gunner.
Bryant Ives
I liked the new insights in deep machine learning.
Josip Arneric
We gained some knowledge about NN in general, and what was the most interesting for me were the new types of NN that are popular nowadays.
Tea Poklepovic
I mostly enjoyed the graphs in R :))).
Faculty of Economics and Business Zagreb
The flexible and friendly style. Learning exactly what was useful and relevant for me.
Jenny Tickner
I enjoyed the Excel sheets provided having the exercises with examples. This meant that if Tamil was held up helping other people, I could crack on with the next parts.
Luke Pontin
Learning how to use excel properly.
Torin Mitchell
The way the trainer made complex subjects easy to understand.
Adam Drewry
Detailed and comprehensive instruction given by experienced and clearly knowledgeable expert on the subject.
Justin Roche
Tamil is very knowledgeable and nice person, I have learned from him a lot.
Aleksandra Szubert
I liked the first session. Very intensive and quick.
Digital Jersey
I mostly liked the patience of Tamil.
Laszlo Maros
I really was benefit from the real life practical examples.
Wioleta (Vicky) Celinska-Drozd
A lot of knowledge - theoretical and practical.
Anna Alechno
I genuinely liked his knowledge and practical examples.
Irina Tulgara
Overview and understanding how big the topic is.
British American Shared Services Europe BAT GBS Finance, WER/Centre/EEMEA
Hands on examples were the most helpful.
Sean Kaukas
Michael the trainer is very knowledgeable and skillful about the subject of Big Data and R. He is very flexible and quickly customize the training meeting clients' need. He is also very capable to solve technical and subject matter problems on the go. Fantastic and professional training!.
Xiaoyuan Geng - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
I really enjoyed the introduction of new packages.
Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
The tutor, Mr. Michael An, interacted with the audience very well, the instruction was clear. The tutor also go extent to add more information based on the requests from the students during the training.
Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
The subject matter and the pace were perfect.
Tim - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada
Good overview of R and good range of topics. Trainer was happy to answer all questions.
Symphony EYC
I really enjoyed the knowledge of the trainer.
Stephanie Seiermann
It was very informative and professionally held. Wojteks knowledge level was so advanced that he could basically answer any question and he was willing to put effort into fitting the training to my personal needs.
Sonja Steiner - BearingPoint GmbH
I really liked the exercises on time series modeling.
Teleperformance
New tool which is “R” and I find it interesting to know the existence of such tool for data analysis.
Michael Lopez - Teleperformance
The tool was interesting and I see the use. I would like to learn about more about it.
- Teleperformance
the matter was well presented and in an orderly manner.
Marylin Houle - Ivanhoe Cambridge
He really explained everything well and used examples.
- royal bank of Canada
I enjoyed the self-learning through exercises and the tips and shortcuts shared.
- Competition Bureau
I was benefit from the good examples and opportunity to follow along.
- Environmental and Climate Change Canada
I genuinely enjoyed the hands passed exercises.
Yunfa Zhu - Environmental and Climate Change Canada
The trainer, the food, and the space were all great.
- Canada Revenue Agency
I like actually writing code with sample data and annotating the script for future reference.
- Canada Revenue Agency
The pre-made scripts used for training material was very useful. The interactive training allowed for a clear understanding of each topic.
- Canada Revenue Agency
The trainer was very concern about individual understanding.
Muhammad Surajo Sanusi - Birmingham City University
Excellent presentation and it gives me confidence to build on knowledge gained.
- Birmingham City University
Background knowledge and 'provenance' of trainer.
Francis McGonigal - Birmingham City University
Resources
Hafiz Rana - Birmingham City University
Good explanations on how we do things
- Birmingham City University
I feel more confident with coding now. I've never done it before but now I understand that it's not rocket science and I can do it when necessary.
Anna Yartseva - Birmingham City University
Modeling and how to fit the data to model
- USDA
The remote classroom setting worked very well
- Trimac Management Services LP
Good detail on what R is used for and how to start using it right away
Hoss Shenassa - Trimac Management Services LP
The many practical examples / assignments that we went through were great. For me, I learn better by seeing examples and applying them elsewhere. The use of real data and applying what was taught against it was extremely valuable. Michaels PowerPoint presentations and his ability to work through each solution was invaluable.
- Trimac Management Services LP
The exercises.
Elena Velkova - CEED Bulgaria
Related Courses
From Data to Decision with Big Data and Predictive Analytics
21 hoursAudience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to
Data Mining and Analysis
28 hoursObjective: Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive
Data Mining
21 hoursCourse can be provided with any tools, including free open-source data mining software and applications
Data Mining with R
14 hoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data
MonetDB
28 hoursMonetDB is an open-source database that pioneered the column-store technology approach. In this instructor-led, live training, participants will learn how to use MonetDB and how to get the most value out of it. By the end of this training,
Oracle SQL Intermediate - Data Extraction
14 hoursThe objective of the course is to enable participants to gain a mastery of how to work with the SQL language in Oracle database for data extraction at intermediate level.
Statistics with SPSS Predictive Analytics Software
14 hoursGoal: Learning to work with SPSS at the level of independence The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining
Data Vault: Building a Scalable Data Warehouse
28 hoursData Vault Modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all the time". Its
Data Visualization
28 hoursThis course is intended for engineers and decision makers working in data mining and knoweldge discovery. You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers
Foundation R
7 hoursThe objective of the course is to enable participants to gain a mastery of the fundamentals of R and how to work with data.
Data Mining & Machine Learning with R
14 hoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data
Data Science for Big Data Analytics
35 hoursBig data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer,
Knowledge Discovery in Databases (KDD)
21 hoursKnowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In
Process Mining
21 hoursProcess mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with
Introduction to Data Visualization with Tidyverse and R
7 hoursThe Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble. In this instructor-led, live training,