Main content

Bayesian Clustering Approaches for Discrete Data

Show simple item record

dc.contributor.advisor Rothstein, Steven J.
dc.contributor.author Silva, H. Anjali
dc.date.accessioned 2018-05-09T17:49:45Z
dc.date.available 2020-06-19T05:00:51Z
dc.date.copyright 2018-03
dc.date.created 2018-03-15
dc.date.issued 2017-11-30
dc.identifier.uri http://hdl.handle.net/10214/13025
dc.description.abstract Unsupervised classification or clustering uses no a priori knowledge of the labels of the observations in the process of categorizing data. The research contained in this thesis focuses on the machine learning of discrete-valued gene expression datasets using clustering, with the aim of identifying gene co-expression networks. Specifically, a number of topics surrounding the use of mixture models and Markov chain Monte Carlo (MCMC) methods in clustering of discrete data from high-throughput transcriptome sequencing technologies is presented. After outlining current challenges and gaps in research with respect to clustering approaches, three mixture model-based clustering methods are presented: mixtures of multivariate Poisson-log normal distributions, mixtures of multivariate Poisson-log normal factor analyzers and mixtures of matrix-variate Poisson-log normal distributions. Significance, innovation, limitations and a number of future directions stemming from this research are discussed. en_US
dc.description.sponsorship Queen Elizabeth II Graduate Scholarship in Science & Technology; Ontario Graduate Fellowship; Arthur Richmond Memorial Scholarship; Canadian Statistical Sciences Institute Travel Scholarship; Statistical Society of Canada Travel Scholarship; Women in Machine Learning Travel Scholarships. en_US
dc.language.iso en en_US
dc.publisher arXiv.org en_US
dc.subject Clustering en_US
dc.subject RNA sequencing en_US
dc.subject Discrete data en_US
dc.subject Multivariate Poisson-Log Normal distribution en_US
dc.subject Markov chain Monte Carlo en_US
dc.subject Factor analyzers en_US
dc.subject Matrix variate distribution en_US
dc.subject Co-expression network en_US
dc.title Bayesian Clustering Approaches for Discrete Data en_US
dc.type Article en_US
dc.degree.programme Bioinformatics en_US
dc.degree.name Doctor of Philosophy en_US
dc.degree.department Department of Mathematics and Statistics en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.
dcterms.relation Silva A., Rothstein, S. J. McNicholas, P. D. and Subedi, S. (2017) A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data. arXiv preprint arXiv:1711.11190.


Files in this item

Files Size Format View Description
Silva_Anjali_201803_PhD.pdf 9.974Mb PDF View/Open Thesis

This item appears in the following Collection(s)

Show simple item record