Title:
|
Bayesian Clustering Approaches for Discrete Data |
Author:
|
Silva, H. Anjali
|
Department:
|
Department of Mathematics and Statistics |
Program:
|
Bioinformatics |
Advisor:
|
Rothstein, Steven J. |
Abstract:
|
Unsupervised classification or clustering uses no a priori knowledge of the labels of the observations in the process of categorizing data. The research contained in this thesis focuses on the machine learning of discrete-valued gene expression datasets using clustering, with the aim of identifying gene co-expression networks. Specifically, a number of topics surrounding the use of mixture models and Markov chain Monte Carlo (MCMC) methods in clustering of discrete data from high-throughput transcriptome sequencing technologies is presented. After outlining current challenges and gaps in research with respect to clustering approaches, three mixture model-based clustering methods are presented: mixtures of multivariate Poisson-log normal distributions, mixtures of multivariate Poisson-log normal factor analyzers and mixtures of matrix-variate Poisson-log normal distributions. Significance, innovation, limitations and a number of future directions stemming from this research are discussed. |
URI:
|
http://hdl.handle.net/10214/13025
|
Date:
|
2018-03 |
Terms of Use:
|
All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated. |
Related Publications:
|
Silva A., Rothstein, S. J. McNicholas, P. D. and Subedi, S. (2017) A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data. arXiv preprint arXiv:1711.11190. |