4 Working with SAMtools and BCFtools Part 1

Slides

You can download the slides for this tutorial below.

Main Exercise

  1. Create a new SAM file by aligning the following FASTQ files against your bordetella reference genome using bwa mem in a script.

    # FASTA
    /projects/micb405/analysis/references/ASM107827v1/GCA_001078275.1_ASM107827v1_genomic.fna
    
    # FASTQ
    /projects/micb405/data/bordetella/F01_R1_1M.fastq
    /projects/micb405/data/bordetella/F01_R2_1M.fastq
  2. Run samtools flagstat on your resulting SAM file. What do the different lines mean?

  3. Next, create a script that processes the resulting SAM into an indexed, sorted BAM with duplicates removed.

  4. For sorting and duplicate removal, what is a “sanity check” step you could perform on the output (hint… what function within samtools can allow you to check your binary BAM file by eye) to verify that reads have been sorted (easier), or that duplicates have been removed (harder)? What would you be looking for to confirm that your commands worked as intended?

  5. Then, add a line in your script to call variants using bcftools! This may take a while, so how could you run it so that it happens in the background and you could check back on it later?

Bonus

Download your reference, bam, bam.bai, vcf.gz, and vc.gz.tbi files and view them in IGV viewer. Go to one of the areas marked as a variant by bcftools call. Do you agree with this “call”?