9 Project description

In this project, you will work in groups to conduct gene-centric mapping of functional and phylogenetic anchors encoded in microbial genomes sourced from the Saanich Inlet water column. Saanich Inlet is a seasonally anoxic fjord on the coast of Vancouver Island British Columbia that serves as a model ecosystem for studying microbial community responses to ocean deoxygenation.

Numerous studies from Saanich Inlet over the years have identified key microbial players mediating coupled biogeochemical cycling of carbon, nitrogen and sulfur extensible to expanding marine oxygen minimum zones throughout the global ocean. Here, you will use metagenomic and metatranscriptomic data sets generated from Cruise 72 spanning 7 depths in Saanich Inlet. In addition to fastq reads, the metagenomic data sets include assembled contigs, metagenome assembled genomes (MAGs) and single-cell amplified genomes (SAGs) spanning the genomic information hierarchy.

The Cruise 72 metagenomic and metatranscriptomic data sets can be evaluated in relation to geochemical parameter information to better resolve both the metabolic potential and gene expression patterns of microbial communities inhabiting the Saanich Inlet water column. Each capstone project group will have the opportunity to use TreeSAPP and iToL to chart the abundance, expression and taxonomic diversity of functional and phylogenetic anchor genes represented in the current reference package collection. Groups can augment this collection with new reference packages produced during the TreeSAPP tutorial or at any time during the project phase depending on their research interests. Reference packages can also be updated using taxonomic information associated with the MAGs and SAGs described above.

The underlying conceptual framework for this project is described in the three lectures associated with course Module 2 along with examples of data visualization approaches useful in developing a coherent scientific narrative. For a given gene encoding a biochemical transformation consider the extent to which this functionality is distributed within the community and how it is represented at different levels of biological information flow e.g. DNA versus RNA.

9.1 Guiding research questions

Several options are provided to help you develop your capstone project, although each group is free to come up with an original plan of action. Please consult with the teaching team if you have questions.

  1. Select your reference package(s) for analysis based on one of the following options:

    1. Choose one or more pathways of a geochemical cycle e.g. nitrogen with reference packages already available for TreeSAPP (see table of available TreeSAPP reference packages). For example, you could select NapA, NirK, NirS, NorB, NorC and NosZ to investigate denitrification in the Saanich Inlet water column.

    2. Create new reference packages to expand the TreeSAPP collection and analyze the results. You could select a pthway within a biogeocehmical cycle which does NOT have reference packages available covering the complete pathway e.g. sulfur cycle including sulfur oxidation and DMSP conversion, or a completely new pathway with no reference packages avaialble.

    3. Select all reference packages in the collection including those developed by the class during the treesapp create tutorial and map the results. Focus on evaluating the purity of each reference package and update as needed to reflect diversity of genes endemic to the Saanich Inlet water column.

  2. Perform a preliminary analysis using the Saanich Inlet data for all depths.

  3. Update reference packages as needed based on preliminary analysis.

  4. Look for patterns in gene abundance, expression and diversity as a function of water column geochemical parameter information.

9.1.1 Questions for project with a focus on biogeochemical cycles (repeated in report structure below)

  1. How does gene and transcript abundance vary across depths for each reference package for a given biogeochemical cycle?

  2. How does microbial diversity differ among and between steps within a pathway e.g. denitrification? Are these trends similar for both genes and transcripts at different depths?

  3. What is the taxonomic breakdown of genes and transcripts within a given biogeochemcial cycle? Can you discern evidence for horizontal gene transfer of one or more functional anchor genes?

  4. Can you identify evidence for distributed metabolism within one or more pathways? Are these trends similar for both genes and transcripts at different depths?

  5. How do answers to questions 2 and 3 vary depending on the taxonomic rank used in the analysis?

  6. How does the abundance, expression and taxonomy of identified genes relate to water column geochemical parameter information (use the geochemical data in Saanich_Data.csv from our previous data science sessions)?

9.1.2 Questions for project with a focus on using all reference packages

In addition to the questions provided above for biogeochemical cycles consider quality control metrics in your analysis.

  1. How does the purity of each reference package impact your results?

  2. How much information is discarded due to insufficient taxonomic resolution?

  3. How much information is retained after updating reference packages with MAGs or SAGs?

  4. What is the impact on diversity metrics following the use of updated reference packages?

9.2 Resources

You will be provided with a script template for both the shell and R portion of your analysis (treesapp_analysis.sh and treesapp_analysis.R) that will guide you as you develop your code.

As stated in the project description, you will have access to metagenomic and metatranscriptomic data sets generated from Cruise 72 spanning 7 depths in Saanich Inlet, MAGs and SAGs.

9.3 Your submission

Your final submission will consist of 3 separate files: the report itself (docx or pdf), one shell script treesapp_analysis.sh, and one R script treesapp_analysis.R (both script files must be in plain text format). The report should not contain any code, but should contain versions of software tools used and a high-level description of your workflow (i.e describe what was done and NOT how).

9.4 Timeline

The following provides an outline as well as some specific milestones within the project.

Capstone timeline
Date Description
Mar 18 Introduction and begin running TreeSAPP
Mar 21, 23, 25 TreeSAPP tutorials
Mar 28

Start of capstone project

Groups present project ideas

Mar 30, Apr 1, 4, 6 Assisted group work
Apr 8 Project discussion
Apr 11 Course recap and discussion
April 12–27

Report writing

Groups are expected to meet remotely as needed over the Finals Period in order to complete the report. This report serves as a final for this course and should be treated as such.

April 24

Final due date

For reports

9.5 Reports

Reports should be formatted as per the Instructions to Authors for the Journal of Bacteriology.

Each group will submit one report with the sections below.

Report structure
Section Description

Abstract

200–250 words

Note that an Importance section is not required.

Introduction

500–750 words

Introduce Saanich Inlet as a model ecosystem for studying microbial community responses to ocean deoxygenation e.g. seasonal cycles, relevant biogeochemistry, previous studies, etc.

Overview of a geochemical cycle including its global impacts, microbial foundations and involved genes.

Methods

300–500 words

Briefly describe the data (sampling, sequencing, processing, etc.)

Briefly describe your analysis methods including

  • TreeSAPP version and commands used

  • iTOL version

  • R version and packages used

  • Statistics (if applicable)

Provide one single shell script and one single R script (i.e treesapp_analysis.sh and treesapp_analysis.R) as individual files (i.e. not as part of your manuscript) containing the complete code to generate your results.

Results

500–750 words

Your analysis should address the guiding research questions you have developed.

You must include ≥ 5 figures/panels with titles and full captions. These figures can be combined into multi-panel figures if desired.

Discussion

750–1000 words

Frame results within a broader discussion of Saanich Inlet (Apr 14 discussion).

Propose evolutionary, environmental, etc. reasoning for distributed metabolism as seen in the geochemcial pathway.

Future directions

References

10 or more

Formatted in the ASM style such as for the Journal of Bacteriology. If you are using a reference manager, this style can be downloaded for EndNote, Mendeley, or Zotero.

Make sure to include citations for the data source papers and software tools used!