1 About

This is an open source textbook aimed at introducing data science and R programming to undergraduate and graduate students. It was originally written as a learning tool for the ECOSCOPE workshops hosted at the University of British Columbia.

The book is structured so that each chapter is a different workshop.

1.1 Schedule

2021 Term 1 (Sept - Dec)
Date Time Workshop Register
Sept 21 (Tues) 2-4 PM Intro to R Click here to register
Sept 28 & 30 (Tues & Thurs) 1-4 PM Intro to the tidyverse Click here to register
Oct 19 (Tues) 2-4 PM Intro to R Click here to register
Oct 26 & 28 (Tues & Thurs) 1-4 PM Statistical models in R Click here to register
Nov 16 (Tues) 2-4 PM Intro to R Click here to register
Nov 23 & 25 (Tues & Thurs) 1-4 PM Intermediate R programming Click here to register

1.2 Workshops

1.2.1 Introduction to R and R Studio

Author(s): Gil B. Henriques, Florent Mazel, Yue Liu, Kim Dill-McFarland & Stephan Koenig

This is a truly introductory workshop for beginners with no experience in R. In this workshop, we introduce you to R and RStudio at the beginner level. This condensed 2-hour workshop is meant to get you started in R and acts as a pre-requisite for our more advanced workshops.

In it, we cover:

  • R and RStudio
  • RStudio projects
  • R scripts
  • Installing packages
  • Reading in data as a data frame
  • Vectors, single values, and data types
  • Basic data visualization
  • The help function

1.2.2 Introduction to the tidyverse

Author(s): Gil B. Henriques, Kim Dill-McFarland, Kris Hong & Stephan Koenig

In this workshop, we provide a brief introduction to RStudio, then delve into data manipulation and graphics in the tidyverse including the packages dplyr, tidyr, and ggplot2. We teach different ways to manipulate data in tabular and text forms as well as the critical concepts underlying the grammar of graphics and how they are implemented in ggplot. We will use RStudio, a powerful but user-friendly R environment, and teach you how to use it effectively.

You will learn how to:

  • create an R project and import data from a file into R,
  • create subsets of rows or columns from data frames using dplyr,
  • select pieces of an object by indexing using element names or position,
  • change your data frames between wide and narrow formats,
  • create various types of graphics,
  • modify the various features of a graphic, and
  • save your graphic in various formats

1.2.3 Reproducible research

Author(s): Gil J. B. Henriques, Kim Dill-McFarland, Kris Hong & Stephan Koenig

In this workshop, we introduce computational reproducibility and its importance to modern research. We will teach the general principles for reproducible computer-based analyses, along with specific methods and tools for reproducibility and version control with RStudio and GitHub.

You will learn how to:

  • construct reproducible, automatable workflows in R with scripts and Make.
  • create reproducible documents using Rmarkdown to include underlying code/computations with relevant graphical and statistical results in several different formats (reports, presentation slides, handouts, notes).
  • use Git version control.
  • integrate version control with GitHub for both personal and group projects.

1.2.4 Statistical models in R

Author(s): Gil J. B. Henriques & Andrew Li based on notes by Yue Liu and Kim Dill-McFarland in collaboration with Applied Statistics and Data Science Group

In this workshop, we introduce various types of regression models and how they are implemented in R. We cover linear regression, ANOVA, ANCOVA and mixed effects models for continuous response data, logistic regression binary response data, and Poisson and Negative Binomial regression for count response data.

You will learn:

  • the assumptions behind the different models.
  • how to interpret the main effects and interaction terms in a model.
  • various experimental design concepts that help maximize the power.

In R, you will learn how to;

  • build a statistical model.
  • define and manipulate model terms.
  • use the lsmeans package to answer specific research questions.

1.2.5 Intermediate R programming

Author(s): Kim Dill-McFarland & Andrew Li in collaboration with the Applied Statistics and Data science group

In this workshop, we teach you to use R as a programming environment, allowing you to write more complex, yet clearer data analysis code. We will teach you three fundamental concepts of R programming: functions, classes, and packages.

You will learn how to:

  • define objects, classes, and attributes in data and built-in functions.
  • write functions for loops.
  • output large result tables to your hard drive.
  • write and publish an R package.
  • write formal automated tests (aka unit testing).

License: GPL v3