9 Project description
In this project, you will work in groups to conduct gene-centric mapping of functional and phylogenetic anchors encoded in microbial genomes sourced from the Saanich Inlet water column. Saanich Inlet is a seasonally anoxic fjord on the coast of Vancouver Island British Columbia that serves as a model ecosystem for studying microbial community responses to ocean deoxygenation.
Numerous studies from Saanich Inlet over the years have identified key microbial players mediating coupled biogeochemical cycling of carbon, nitrogen and sulfur extensible to expanding marine oxygen minimum zones throughout the global ocean. Here, you will use metagenomic and metatranscriptomic data sets generated from Cruise 72 spanning 7 depths in Saanich Inlet. In addition to fastq reads, the metagenomic data sets include assembled contigs, metagenome assembled genomes (MAGs) and single-cell amplified genomes (SAGs) spanning the genomic information hierarchy.
The Cruise 72 metagenomic and metatranscriptomic data sets can be evaluated in relation to geochemical parameter information to better resolve both the metabolic potential and gene expression patterns of microbial communities inhabiting the Saanich Inlet water column. Each capstone project group will have the opportunity to use TreeSAPP and iToL to chart the abundance, expression and taxonomic diversity of functional and phylogenetic anchor genes represented in the current reference package collection. Groups can augment this collection with new reference packages produced during the TreeSAPP tutorial or at any time during the project phase depending on their research interests. Reference packages can also be updated using taxonomic information associated with the MAGs and SAGs described above.
The underlying conceptual framework for this project is described in the three lectures associated with course Module 2 along with examples of data visualization approaches useful in developing a coherent scientific narrative. For a given gene encoding a biochemical transformation consider the extent to which this functionality is distributed within the community and how it is represented at different levels of biological information flow e.g. DNA versus RNA.
9.1 Guiding research questions
Several options are provided to help you develop your capstone project, although each group is free to come up with an original plan of action. Please consult with the teaching team if you have questions.
-
Select your reference package(s) for analysis based on one of the following options:
Choose one or more pathways of a geochemical cycle e.g. nitrogen with reference packages already available for TreeSAPP (see table of available TreeSAPP reference packages). For example, you could select NapA, NirK, NirS, NorB, NorC and NosZ to investigate denitrification in the Saanich Inlet water column.
Create new reference packages to expand the TreeSAPP collection and analyze the results. You could select a pthway within a biogeocehmical cycle which does NOT have reference packages available covering the complete pathway e.g. sulfur cycle including sulfur oxidation and DMSP conversion, or a completely new pathway with no reference packages avaialble.
Select all reference packages in the collection including those developed by the class during the treesapp create tutorial and map the results. Focus on evaluating the purity of each reference package and update as needed to reflect diversity of genes endemic to the Saanich Inlet water column.
Perform a preliminary analysis using the Saanich Inlet data for all depths.
Update reference packages as needed based on preliminary analysis.
Look for patterns in gene abundance, expression and diversity as a function of water column geochemical parameter information.
9.1.1 Questions for project with a focus on biogeochemical cycles (repeated in report structure below)
How does gene and transcript abundance vary across depths for each reference package for a given biogeochemical cycle?
How does microbial diversity differ among and between steps within a pathway e.g. denitrification? Are these trends similar for both genes and transcripts at different depths?
What is the taxonomic breakdown of genes and transcripts within a given biogeochemcial cycle? Can you discern evidence for horizontal gene transfer of one or more functional anchor genes?
Can you identify evidence for distributed metabolism within one or more pathways? Are these trends similar for both genes and transcripts at different depths?
How do answers to questions 2 and 3 vary depending on the taxonomic rank used in the analysis?
How does the abundance, expression and taxonomy of identified genes relate to water column geochemical parameter information (use the geochemical data in Saanich_Data.csv from our previous data science sessions)?
9.1.2 Questions for project with a focus on using all reference packages
In addition to the questions provided above for biogeochemical cycles consider quality control metrics in your analysis.
How does the purity of each reference package impact your results?
How much information is discarded due to insufficient taxonomic resolution?
How much information is retained after updating reference packages with MAGs or SAGs?
What is the impact on diversity metrics following the use of updated reference packages?
9.2 Resources
You will be provided with a script template for both the shell and R portion of your analysis (treesapp_analysis.sh
and treesapp_analysis.R
) that will guide you as you develop your code.
As stated in the project description, you will have access to metagenomic and metatranscriptomic data sets generated from Cruise 72 spanning 7 depths in Saanich Inlet, MAGs and SAGs.
9.3 Your submission
Your final submission will consist of 3 separate files: the report itself (docx
or pdf
), one shell script treesapp_analysis.sh
, and one R script treesapp_analysis.R
(both script files must be in plain text format). The report should not contain any code, but should contain versions of software tools used and a high-level description of your workflow (i.e describe what was done and NOT how).
9.4 Timeline
The following provides an outline as well as some specific milestones within the project.
Date | Description |
---|---|
Mar 18 | Introduction and begin running TreeSAPP |
Mar 21, 23, 25 | TreeSAPP tutorials |
Mar 28 |
Start of capstone project Groups present project ideas |
Mar 30, Apr 1, 4, 6 | Assisted group work |
Apr 8 | Project discussion |
Apr 11 | Course recap and discussion |
April 12–27 |
Report writing Groups are expected to meet remotely as needed over the Finals Period in order to complete the report. This report serves as a final for this course and should be treated as such. |
April 24 |
Final due date For reports |
9.5 Reports
Reports should be formatted as per the Instructions to Authors for the Journal of Bacteriology.
Each group will submit one report with the sections below.
Section | Description |
---|---|
Abstract 200–250 words |
Note that an Importance section is not required. |
Introduction 500–750 words |
Introduce Saanich Inlet as a model ecosystem for studying microbial community responses to ocean deoxygenation e.g. seasonal cycles, relevant biogeochemistry, previous studies, etc. Overview of a geochemical cycle including its global impacts, microbial foundations and involved genes. |
Methods 300–500 words |
Briefly describe the data (sampling, sequencing, processing, etc.) Briefly describe your analysis methods including
Provide one single shell script and one single R script (i.e |
Results 500–750 words |
Your analysis should address the guiding research questions you have developed. You must include ≥ 5 figures/panels with titles and full captions. These figures can be combined into multi-panel figures if desired. |
Discussion 750–1000 words |
Frame results within a broader discussion of Saanich Inlet (Apr 14 discussion). Propose evolutionary, environmental, etc. reasoning for distributed metabolism as seen in the geochemcial pathway. Future directions |
References 10 or more |
Formatted in the ASM style such as for the Journal of Bacteriology. If you are using a reference manager, this style can be downloaded for EndNote, Mendeley, or Zotero. Make sure to include citations for the data source papers and software tools used! |