9 BEDtools and plyranges
9.1 Setting up for plyranges
Create a new R project and load in the tidyverse and plyranges into your environment:
Download the files MACS2 narrowPeak BED file (from the STAR tutorial), as well as a GTF (i.e. GFF3)-formatted file containing mouse gene information (from the ChIP tutorial) from the Orca server. The files have been stripped of non-autosomal and non-canonical chromosomes for compatibility (different patches of mm10 will have different names for mitochondria and different “alternate” chromosomes, which makes plyranges angry).
9.2 Genome arithmetic
Let’s start simple: what is a command you can run to display the unique seqnames (i.e. chromosomes) of each file? Remember that these are annotations on a mouse genome!
Perform a left join (i.e., keep the original records of
GRanges A, and attach metadata of intersectingGRanges B) ofNaive_H3K27ac_peaks.autosomes.narrowPeakandMus_musculus.GRCm38.84.chr.autosomes.gtf, but keep the metadata columnsname,signalValue,qValue,type, andgene_idof the combinedGRangesobject.Using the previous GRanges object, group the ranges by
typeand summarize the meansignalValuefor eachtype. Remember that yoursignalValuecomes from the H3K27ac peak calls, and yourtypecolumn contains known classifications of genomic areas. Whattypehas the highest meansignalValue? Does this make sense given the function of H3K27ac in mammalian genomes?Unfortunately, a left join means that some of your peaks will not have any genome information attached to them, as they did not overlap anything. What is a command that you could run to annotate the nearest genome information onto your peaks?
Can you think of any applications for
plyrangesthat might be useful in your final project? Discuss with your group!