Comparing Metagenomics and Total RNA Sequencing for Microbial Identification and Environmental Assessments
Metagenomics and total RNA sequencing (total RNA-Seq) have the potential to improve the taxonomic identification of microbial communities and can be used to reconstruct novel reference sequences of microbial ribosomal RNA (rRNA), which could simplify the incorporation of microbes into environmental assessments. However, these target-PCR-free methods require more testing and optimization. In this thesis, I compared metagenomics and total RNA-Seq to contribute to the advancement of microbial identification, the inclusion of microbial diversity in environmental assessments, and the application of machine learning for environmental assessments. First, I compared both methods in terms of their accuracy in identifying a microbial mock community. This comparison also involved extensive testing of bioinformatic data-processing tools. I demonstrated that total RNA-Seq is more accurate than metagenomics at considerably lower sequencing depths. While data-processing tools require further exploration, I conclude that total RNA-Seq might be a favorable alternative to metagenomics for target-PCR-free taxonomic identifications of microbial communities. Subsequently, I used the same dataset to test whether total RNA-Seq or metagenomics reconstructs more complete SSU rRNA sequences for microbial mock community species. This is important because the full-length reconstruction of SSU rRNA sequences from HTS data improves the accuracy of microbial community identification in comparison to shorter reference sequences. I showed that total RNA-Seq allowed for the complete or near-complete reconstruction of all mock community SSU rRNA sequences and outperformed metagenomics. Lastly, I applied total RNA-Seq and metagenomics to samples from a stream mesocosm experiment, in which mesocosms were exposed to two key aquatic stressors at multiple stressor levels. In combination with amplicon sequencing data provided by collaborators, I investigated if taxonomic datasets based on total RNA-Seq, metagenomics, or amplicon sequencing allowed for the best predictions of stressor levels based on machine learning. Metagenomics and total RNA-Seq performed poorly overall, and 16S sequencing outperformed all other sequencing methods in terms of stressor prediction, but this was likely related to the insufficient sequencing depth of metagenomics and total RNA-Seq samples. This thesis demonstrates the advantage of total RNA-Seq for microbial identification but also the need for further research to refine the method for possible implementation in environmental assessments.