Main content

Model-based clustering of high-dimensional binary data

Show simple item record

dc.contributor.advisor McNicholas, Paul D.
dc.contributor.advisor Browne, Ryan P.
dc.contributor.author Tang, Yang
dc.date.accessioned 2013-09-05T16:42:41Z
dc.date.available 2013-09-05T16:42:41Z
dc.date.copyright 2013-08
dc.date.created 2013-08-16
dc.date.issued 2013-09-05
dc.identifier.uri http://hdl.handle.net/10214/7458
dc.description.abstract We present a mixture of latent trait models with common slope parameters (MCLT) for high dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a d-dimensional Gaussian latent variable, is extended by implementing common factor analyzers. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through block-specific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. The Bayesian information criterion is used to select the number of components and the covariance structure as well as the dimensions of latent variables. Our approach is demonstrated on U.S. Congressional voting data and on a data set describing the sensory properties of orange juice. Our examples show that our model performs well even when the number of observations is not very large relative to the data dimensionality. In both cases, our approach yields intuitive clustering results. Additionally, our dimensionality-reduction method allows data to be displayed in low-dimensional plots. en_US
dc.description.sponsorship Early Researcher Award from the Government of Ontario (McNicholas); NSERC Discovery Grants (Browne and McNicholas). en_US
dc.language.iso en en_US
dc.subject binary data en_US
dc.subject clustering en_US
dc.subject high dimension en_US
dc.subject latent variables en_US
dc.subject mixture models en_US
dc.title Model-based clustering of high-dimensional binary data en_US
dc.type Thesis en_US
dc.degree.programme Mathematics and Statistics en_US
dc.degree.name Master of Science en_US
dc.degree.department Department of Mathematics and Statistics en_US


Files in this item

Files Size Format View
Tang_Yang_201308_Msc.pdf 3.894Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record