9 BEDtools and plyranges

Slides

You can download the slides for this tutorial below.

9.1 Setting up for plyranges

Create a new R project and load in the tidyverse and plyranges into your environment:

Download the files MACS2 narrowPeak BED file (from the STAR tutorial), as well as a GTF (i.e. GFF3)-formatted file containing mouse gene information (from the ChIP tutorial) from the Orca server. The files have been stripped of non-autosomal and non-canonical chromosomes for compatibility (different patches of mm10 will have different names for mitochondria and different “alternate” chromosomes, which makes plyranges angry).

scp ahauduc_mb20@orca1.bcgsc.ca:/projects/micb405/analysis/ChIP_tutorial/Naive_H3K27ac_peaks.autosomes.narrowPeak .
scp ahauduc_mb20@orca1.bcgsc.ca:/projects/micb405/analysis/STAR_tutorial/Mus_musculus.GRCm38.84.chr.autosomes.gtf .

9.2 Genome arithmetic

  1. Let’s start simple: what is a command you can run to display the unique seqnames (i.e. chromosomes) of each file? Remember that these are annotations on a mouse genome!

  2. Perform a left join (i.e., keep the original records of GRanges A, and attach metadata of intersecting GRanges B) of Naive_H3K27ac_peaks.autosomes.narrowPeak and Mus_musculus.GRCm38.84.chr.autosomes.gtf, but keep the metadata columns name, signalValue, qValue, type, and gene_id of the combined GRanges object.

  3. Using the previous GRanges object, group the ranges by type and summarize the mean signalValue for each type. Remember that your signalValue comes from the H3K27ac peak calls, and your type column contains known classifications of genomic areas. What type has the highest mean signalValue? Does this make sense given the function of H3K27ac in mammalian genomes?

  4. Unfortunately, a left join means that some of your peaks will not have any genome information attached to them, as they did not overlap anything. What is a command that you could run to annotate the nearest genome information onto your peaks?

  5. Can you think of any applications for plyranges that might be useful in your final project? Discuss with your group!