Main content

Generalized linear regression model with LASSO, group LASSO, and sparse group LASSO regularization methods for finding bacteria associated with colorectal cancer using microbiome data

Show full item record

Title: Generalized linear regression model with LASSO, group LASSO, and sparse group LASSO regularization methods for finding bacteria associated with colorectal cancer using microbiome data
Author: Bak, Stephen
Department: Department of Mathematics and Statistics
Program: Bioinformatics
Advisor: Dang, SanjeenaFeng, Zeny
Abstract: With ever increasing advancements in microbiome sequencing technologies, the need for efficient statistical modelling of these systems has become apparent. Most microbiome data is filled with sparsity and therefore creates a problem for modelling with many conventional statistical analysis methods. For example, in the study of Nakatsu et al. (2015), the 16S ribosomal RNA sequencing on the colon tissue of healthy, carcinoma-inflicted, and adenoma-inflicted subjects were collected. One wishes to identify bacteria that are associated to the outcome of the three health states. The ordinary binomial or multinomial regression model would fail to perform a meaningful analysis due to the large number of taxa and the sparsity of the taxonomic count. In this thesis, we attempt to solve these problems by using the LASSO, group LASSO, and sparse group LASSO regularization on the multinomial and binomial regression models. Raw read microbiome sequencing data of the study of Nakatsu et al. (2015) is obtained from the Sequence Read Archive, of NCBI. The software "mothur" is used to preprocess the sequence data and cluster them into Operational Taxonomic Units (OTUs), and OTU counts are obtained for each taxa. We find that, in general, similar bacteria are chosen for healthy and adenoma phenotypes, and different bacteria are chosen for the carcinoma phenotype. We find that Proteobacteria are more often selected under the normal phenotype, whereas Fusobacterium are more often selected under the carcinoma phenotype. The adenoma phenotype generally resembles the bacteria from the other two phenotypes, but with different coefficients.
URI: http://hdl.handle.net/10214/12096
Date: 2017-12
Rights: Attribution-ShareAlike 2.5 Canada


Files in this item

Files Size Format View Description
Bak_Stephen_201712_MSc.pdf 854.7Kb PDF View/Open Full thesis

This item appears in the following Collection(s)

Show full item record

Attribution-ShareAlike 2.5 Canada Except where otherwise noted, this item's license is described as Attribution-ShareAlike 2.5 Canada