10 GO term enrichment analysis
10.1 Setting up for GO term enrichment
Create a new script or R markdown in the same project you used for the DESeq2 tutorial and install and load the required packages. Since these are Bioconductor packages, to install them you must use BiocManager::install()
.
Load the necessary packages.
library(AnnotationDbi)
library(org.Mm.eg.db)
library(GO.db)
library(GOstats)
Copy the significant results from the DESeq2 tutorial into a new file that we will be modifying.
annotated_significant_results <- significant_res
The row names of your significant genes from the DEseq2 tutorial are mouse Ensembl Gene IDs. Convert the Ensemble Gene IDs in the rownames to Entrez IDs as a new column (and add the symbols too!).
annotated_significant_results$symbol <- mapIds(
org.Mm.eg.db,
keys = rownames(annotated_significant_results),
keytype = "ENSEMBL",
column = "SYMBOL",
multiVals = "first"
)
## 'select()' returned 1:many mapping between keys and columns
annotated_significant_results$entrez <- mapIds(
org.Mm.eg.db,
keys = rownames(annotated_significant_results),
keytype = "ENSEMBL",
column = "ENTREZID",
multiVals = "first"
)
## 'select()' returned 1:many mapping between keys and columns
Create a non-redundant list of genes from your enriched list.
Filter your significant genes by log2FoldChange
to pull out upregulated genes.
genes_upregulated <- annotated_significant_results %>%
as.data.frame() %>%
filter(log2FoldChange > 4) %>%
pull(entrez) %>%
unique()
Create GO hyperGTest
object from a new GOHyperGParams
object that you will create with your upregulated terms and gene IDs, looking in the Biological Process (“BP”) ontology.
go_bp_upregulated <- hyperGTest(new("GOHyperGParams",
geneIds = genes_upregulated,
universeGeneIds = all_genes,
annotation = "org.Mm.eg.db",
ontology = "BP",
pvalueCutoff = 0.01,
conditional = FALSE,
testDirection = "over"))
Convert your hyperGTest object into a viewable format, and print the top 10 enriched GO terms.
## GOBPID Pvalue OddsRatio ExpCount Count Size
## 1 GO:0007600 3.053786e-20 3.661730 55.418934 118 220
## 2 GO:0050877 7.971929e-20 2.596491 108.822633 192 432
## 3 GO:0003008 9.402210e-17 2.064621 172.554407 264 685
## 4 GO:0007608 6.912100e-16 17.745569 10.328074 35 41
## 5 GO:0007606 5.604887e-15 9.620635 13.602829 41 54
## 6 GO:0032501 7.256714e-12 1.524856 642.859630 752 2552
## 7 GO:0007186 7.555729e-12 2.522593 61.968444 110 246
## 8 GO:0019228 3.498883e-11 31.645833 5.793798 21 23
## 9 GO:0098660 4.808290e-11 2.390277 64.991295 112 258
## 10 GO:0098655 6.360296e-11 2.343399 67.510337 115 268
## Term
## 1 sensory perception
## 2 nervous system process
## 3 system process
## 4 sensory perception of smell
## 5 sensory perception of chemical stimulus
## 6 multicellular organismal process
## 7 G protein-coupled receptor signaling pathway
## 8 neuronal action potential
## 9 inorganic ion transmembrane transport
## 10 cation transmembrane transport
10.2 Additional questions
Now, how could you pull out downregulated Biological Process terms in annotated_significant_results
? How about upregulated terms for one of the other gene ontologies?
What is the general pattern of upregulated and downregulated GO terms in the Biological Process ontology?