3 FastQC

Slides

You can download the slides for this tutorial below.

Exercise 1

Navigate to /projects/micb405/data/Ebola. You should see paired-end FASTQ files. Let’s say these files just came off the sequencer, and you want to take a quick look ensure nothing went awry during the sequencing process. You’re not interested in the exact stats, but you still want to take a look at all the modules in the graphical HTML report.

Perform FastQC on these files, outputting the results to somewhere in your home directory.

Next, scp just the HTML files to your computer. (Hint: remember how * refers to any name of a file within the folder. How might you modify this to refer to files ending in .html?)

Now, find the transferred pair of HTML files on your computer and open them with the web browser of your choice!

Exercise 2

Take a look at the different modules in the FastQC output. Check out “Per base sequence quality”, “Per sequence quality scores”, and “Per base sequence content” Describe why are these considered acceptable by FastQC. For “Per base sequence content”, what would you need to change for FastQC to flag that module with a warning?

Take a look at “Per sequence GC content”. What went wrong here? Would shifting this distribution so that the mean matched that of the theoretical mean be sufficient for FastQC to flag it as acceptable?

Bonus

Look at “Sequence Duplication Levels”. This pattern is very distinct, and it’s highly unusual for a whole-genome sequencing (DNA) run. However, let’s say you know that this sequencing run was done perfectly. What kind of genetic or epigenetic material could have been sequenced here, rather than the simple DNA that FastQC expects, which causes some of the same sequences to appear many times?

Look at the help page for fastp and choose one trimming or filtering step. Apply this filtering to the Ebola FASTQs and output the filtered FASTQ to a directory in your home directory. When you run fastqc on these filtered FASTQs, can you observe the difference in the report stemming from the filtering steps you enacted? You can also use this sandbox to play around with different fastp parameters!