Data science
Introduction to data science
- Introduction
- Define data science
- List common tools used in data science
Command line
Introduction to Unix
- Introduction
- Define command line
- Describe several advantages to using command line
- Download instructions
- Provides instructions for download and install of Unix terminals for Mac, Linux, and Windows
- Unix navigation tutorial and practice
- Define parts of the terminal
- Use Unix commands to navigate your computer including pwd, ls, man/help, and cd
- Unix manipulation tutorial and practice
- Use Unix commands to manipulate files including mkdir, cp, mv, and rm
- Apply equivalent file paths in Unix commands
- Define best practices for directory and file names
Applications of command line
- BLAST tutorial and practice
- Complete nucleotide BLAST of a large sequencing dataset using command line tools
- Git tutorial and practice
- Enact version control on a text file using Git command line tools
- GitHub tutorial and practice
- Share and modify a version controlled file using GitHub
R/RStudio
Introduction to R
- Introduction
- Describe general uses for R
- List several advantages to using R and RStudio
- Download instructions
- Provides instructions for download and install of R and RStudio
- RStudio tutorial
- Navigate the RStudio software including key shortcuts, projects, packages, and help
- All of our R tutorials and practice are implemented in RStudio so we strongly recommend that this tutorial be included with all R curriculum
- Base R tutorial and practice
- Execute commands in base R to:
- Load tabular data
- Access columns and rows within a data frame
- Perform basic calculations on tabular data
- Subset a data frame
Data manipulation in R
- Data manipulation tutorial and practice
- Load tabular data using the tidyverse
- Subset and clean data in
dplyr
(filter, select, rename, arrange, mutate)
- Summarize data in
dplyr
(group_by, summarize)
- Transform data frames using
tidyr
(gather, spread) and dplyr
(*_join)
- Link multiple tidyverse functions using pipes
%>%
Data visualization in R
- Data visualization tutorial and practice
- Define the grammar of graphics
- Create scatterplots using the
ggplot2
package
- Customize plot color, shape, axes, scales, and other attributes
- Represent subsets of data using facets
- Recommend first completing ‘Data manipulation in R’
Statistics Under development
Introduction to statistics
- Introduction
- Identify and distinguish between a population and a sample, and between parameters and statistics
- Define “p-value”" and interpret its meaning
- Identify factors that influence statistical test selection
Statistics in R/RStudio
- Download instructions
- Provides instructions for download and install of R and RStudio
- RStudio tutorial
- Navigate the RStudio software including key shortcuts, projects, packages, and help
- All of our statistics tutorials and practice are implemented in RStudio so we strongly recommend that this tutorial be included with all R curriculum
- t-tests
- Analysis of Variance (ANOVA)
- Linear regression
Capstone projects Under development
Microbiome analysis team project
- Focuses on biological interpretation of amplicon sequence data from microbiomes