9 BEDtools and plyranges
9.1 Setting up for plyranges
Create a new R project and load in the tidyverse
and plyranges
into your environment:
Download the files MACS2
narrowPeak BED file (from the STAR tutorial), as well as a GTF (i.e. GFF3)-formatted file containing mouse gene information (from the ChIP tutorial) from the Orca server. The files have been stripped of non-autosomal and non-canonical chromosomes for compatibility (different patches of mm10 will have different names for mitochondria and different “alternate” chromosomes, which makes plyranges angry).
9.2 Genome arithmetic
Let’s start simple: what is a command you can run to display the unique seqnames (i.e. chromosomes) of each file? Remember that these are annotations on a mouse genome!
Perform a left join (i.e., keep the original records of
GRanges A
, and attach metadata of intersectingGRanges B
) ofNaive_H3K27ac_peaks.autosomes.narrowPeak
andMus_musculus.GRCm38.84.chr.autosomes.gtf
, but keep the metadata columnsname
,signalValue
,qValue
,type
, andgene_id
of the combinedGRanges
object.Using the previous GRanges object, group the ranges by
type
and summarize the meansignalValue
for eachtype
. Remember that yoursignalValue
comes from the H3K27ac peak calls, and yourtype
column contains known classifications of genomic areas. Whattype
has the highest meansignalValue
? Does this make sense given the function of H3K27ac in mammalian genomes?Unfortunately, a left join means that some of your peaks will not have any genome information attached to them, as they did not overlap anything. What is a command that you could run to annotate the nearest genome information onto your peaks?
Can you think of any applications for
plyranges
that might be useful in your final project? Discuss with your group!