Data science

Introduction to data science

  • Introduction
    • Define data science
    • List common tools used in data science

Command line

Introduction to Unix

  • Introduction
    • Define command line
    • Describe several advantages to using command line
  • Download instructions
    • Provides instructions for download and install of Unix terminals for Mac, Linux, and Windows
  • Unix navigation tutorial and practice
    • Define parts of the terminal
    • Use Unix commands to navigate your computer including pwd, ls, man/help, and cd
  • Unix manipulation tutorial and practice
    • Use Unix commands to manipulate files including mkdir, cp, mv, and rm
    • Apply equivalent file paths in Unix commands
    • Define best practices for directory and file names

Applications of command line

  • BLAST tutorial and practice
    • Complete nucleotide BLAST of a large sequencing dataset using command line tools
  • Git tutorial and practice
    • Enact version control on a text file using Git command line tools
  • GitHub tutorial and practice
    • Share and modify a version controlled file using GitHub

R/RStudio

Introduction to R

  • Introduction
    • Describe general uses for R
    • List several advantages to using R and RStudio
  • Download instructions
    • Provides instructions for download and install of R and RStudio
  • RStudio tutorial
    • Navigate the RStudio software including key shortcuts, projects, packages, and help
    • All of our R tutorials and practice are implemented in RStudio so we strongly recommend that this tutorial be included with all R curriculum
  • Base R tutorial and practice
    • Execute commands in base R to:
      • Load tabular data
      • Access columns and rows within a data frame
      • Perform basic calculations on tabular data
      • Subset a data frame

Data manipulation in R

  • Data manipulation tutorial and practice
    • Load tabular data using the tidyverse
    • Subset and clean data in dplyr (filter, select, rename, arrange, mutate)
    • Summarize data in dplyr (group_by, summarize)
    • Transform data frames using tidyr (gather, spread) and dplyr (*_join)
    • Link multiple tidyverse functions using pipes %>%

Data visualization in R

  • Data visualization tutorial and practice
    • Define the grammar of graphics
    • Create scatterplots using the ggplot2 package
    • Customize plot color, shape, axes, scales, and other attributes
    • Represent subsets of data using facets
    • Recommend first completing ‘Data manipulation in R’

Statistics Under development

Introduction to statistics

  • Introduction
    • Identify and distinguish between a population and a sample, and between parameters and statistics
    • Define “p-value”" and interpret its meaning
    • Identify factors that influence statistical test selection

Statistics in R/RStudio

  • Download instructions
    • Provides instructions for download and install of R and RStudio
  • RStudio tutorial
    • Navigate the RStudio software including key shortcuts, projects, packages, and help
    • All of our statistics tutorials and practice are implemented in RStudio so we strongly recommend that this tutorial be included with all R curriculum
  • t-tests
  • Analysis of Variance (ANOVA)
  • Linear regression

Capstone projects Under development

Metagenomics analysis team project

  • Focuses on pipeline construction and biological interpretation of metagenomic sequence data from microbiomes

Microbiome analysis team project

  • Focuses on biological interpretation of amplicon sequence data from microbiomes