10 GO term enrichment analysis

Slides

You can download the slides for this tutorial below.

10.1 Setting up for GO term enrichment

Create a new script or R markdown in the same project you used for the DESeq2 tutorial and install and load the required packages. Since these are Bioconductor packages, to install them you must use BiocManager::install().

Load the necessary packages.

library(AnnotationDbi)
library(org.Mm.eg.db)
library(GO.db)
library(GOstats)

Copy the significant results from the DESeq2 tutorial into a new file that we will be modifying.

annotated_significant_results <- significant_res

The row names of your significant genes from the DEseq2 tutorial are mouse Ensembl Gene IDs. Convert the Ensemble Gene IDs in the rownames to Entrez IDs as a new column (and add the symbols too!).

annotated_significant_results$symbol <- mapIds(
  org.Mm.eg.db,
  keys = rownames(annotated_significant_results),
  keytype = "ENSEMBL",
  column = "SYMBOL",
  multiVals = "first"
)
## 'select()' returned 1:many mapping between keys and columns
annotated_significant_results$entrez <- mapIds(
  org.Mm.eg.db,
  keys = rownames(annotated_significant_results),
  keytype = "ENSEMBL",
  column = "ENTREZID",
  multiVals = "first"
)
## 'select()' returned 1:many mapping between keys and columns

Create a non-redundant list of genes from your enriched list.

all_genes <- annotated_significant_results %>% 
  as.data.frame() %>% 
  pull(entrez) %>% 
  unique()

Filter your significant genes by log2FoldChange to pull out upregulated genes.

genes_upregulated <- annotated_significant_results %>% 
  as.data.frame() %>% 
  filter(log2FoldChange > 4) %>% 
  pull(entrez) %>% 
  unique()

Create GO hyperGTest object from a new GOHyperGParams object that you will create with your upregulated terms and gene IDs, looking in the Biological Process (“BP”) ontology.

go_bp_upregulated <- hyperGTest(new("GOHyperGParams",
                                    geneIds = genes_upregulated,
                                    universeGeneIds = all_genes,
                                    annotation = "org.Mm.eg.db",
                                    ontology = "BP",
                                    pvalueCutoff = 0.01,
                                    conditional = FALSE,
                                    testDirection = "over"))

Convert your hyperGTest object into a viewable format, and print the top 10 enriched GO terms.

go_bp_upregulated %>% summary() %>% head(10)
##        GOBPID       Pvalue OddsRatio   ExpCount Count Size
## 1  GO:0007600 3.053786e-20  3.661730  55.418934   118  220
## 2  GO:0050877 7.971929e-20  2.596491 108.822633   192  432
## 3  GO:0003008 9.402210e-17  2.064621 172.554407   264  685
## 4  GO:0007608 6.912100e-16 17.745569  10.328074    35   41
## 5  GO:0007606 5.604887e-15  9.620635  13.602829    41   54
## 6  GO:0032501 7.256714e-12  1.524856 642.859630   752 2552
## 7  GO:0007186 7.555729e-12  2.522593  61.968444   110  246
## 8  GO:0019228 3.498883e-11 31.645833   5.793798    21   23
## 9  GO:0098660 4.808290e-11  2.390277  64.991295   112  258
## 10 GO:0098655 6.360296e-11  2.343399  67.510337   115  268
##                                            Term
## 1                            sensory perception
## 2                        nervous system process
## 3                                system process
## 4                   sensory perception of smell
## 5       sensory perception of chemical stimulus
## 6              multicellular organismal process
## 7  G protein-coupled receptor signaling pathway
## 8                     neuronal action potential
## 9         inorganic ion transmembrane transport
## 10               cation transmembrane transport

10.2 Additional questions

Now, how could you pull out downregulated Biological Process terms in annotated_significant_results? How about upregulated terms for one of the other gene ontologies?

What is the general pattern of upregulated and downregulated GO terms in the Biological Process ontology?

10.3 Additional resources

Gene Ontology overview