Variational Approximations and Other Topics in Mixture Models

Loading...
Thumbnail Image

Date

2012-08-24

Authors

Dang, Sanjeena

Journal Title

Journal ISSN

Volume Title

Publisher

University of Guelph

Abstract

Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction almost fifty years ago. Families of mixture models are said to arise when the component parameters, usually the component covariance matrices, are decomposed and a number of constraints are imposed. Within the family setting, it is necessary to choose the member of the family --- i.e., the appropriate covariance structure --- in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for this model selection process, and the expectation-maximization (EM) algorithm has been predominantly used for parameter estimation. We deviate from the EM-BIC rubric, using variational Bayes approximations for parameter estimation and the deviance information criterion (DIC) for model selection. The variational Bayes approach alleviates some of the computational complexities associated with the EM algorithm. We use this approach on the most famous family of Gaussian mixture models known as Gaussian parsimonious clustering models (GPCM). These models have an eigen-decomposed covariance structure. Cluster-weighted modelling (CWM) is another flexible statistical framework for modelling local relationships in heterogeneous populations on the basis of weighted combinations of local models. In particular, we extend cluster-weighted models to include an underlying latent factor structure of the independent variable, resulting in a novel family of models known as parsimonious cluster-weighted factor analyzers. The EM-BIC rubric is utilized for parameter estimation and model selection. Some work on a mixture of multivariate t-distributions is also presented, with a linear model for the mean and a modified Cholesky-decomposed covariance structure leading to a novel family of mixture models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters are estimated using the EM algorithm and another approach to model selection other than the BIC is also considered.

Description

Keywords

High-dimensional data, Variational Bayes Approximations, Mixture Models, EM Algorithm, Factor Analyzers, Longitudinal Data, Gene Expression Data, Cluster-Weighted Models, Classification, Clustering, Model-based clustering, Family of Mixture Models, Model-based Classification, Cluster-Weighted Factor Analyzers

Citation