4 Working with SAMtools and BCFtools Part 1
Main Exercise
-
Create a new SAM file by aligning the following FASTQ files against your bordetella reference genome using
bwa mem
in a script.# FASTA /projects/micb405/analysis/references/ASM107827v1/GCA_001078275.1_ASM107827v1_genomic.fna # FASTQ /projects/micb405/data/bordetella/F01_R1_1M.fastq /projects/micb405/data/bordetella/F01_R2_1M.fastq
Run
samtools flagstat
on your resulting SAM file. What do the different lines mean?Next, create a script that processes the resulting SAM into an indexed, sorted BAM with duplicates removed.
For sorting and duplicate removal, what is a “sanity check” step you could perform on the output (hint… what function within
samtools
can allow you to check your binary BAM file by eye) to verify that reads have been sorted (easier), or that duplicates have been removed (harder)? What would you be looking for to confirm that your commands worked as intended?Then, add a line in your script to call variants using
bcftools
! This may take a while, so how could you run it so that it happens in the background and you could check back on it later?