Dimension Reduction and Clustering using Non-Elliptical Mixtures
Finite mixtures of non-elliptical distributions (specifically the shifted asymmetric Laplace and the generalized hyperbolic) are considered to introduce dimension reduction methods for model-based clustering. The approaches are based on existing work on reducing dimensionality in the case of finite Gaussian mixtures. The methods rely on identifying a reduced subspace of the data by considering the extent to which group means and group covariances vary. This subspace contains linear combinations of the original data, which are ordered by importance via the associated eigenvalues. Observations can be projected onto the subspace and the resulting set of variables captures most of the clustering structure available in the data. The algorithms are illustrated using simulated and real data. Furthermore, methods of detecting outliers are developed for model-based clustering using mixtures of contaminated shifted asymmetric Laplace distributions, and mixtures of contaminated skew-normal distributions. The approaches are based on existing work for outlier detection in the context of contaminated Gaussian mixtures. The main idea is to introduce a contamination factor which increases the dispersion of the fitted distribution by altering the skewness and covariance parameters. An expectation-conditional maximization algorithm is employed to obtain maximum likelihood estimates for the parameters in the model. Thus each observation is given a posterior probability of belonging to a particular group, and of being an outlier or not. The performance of the methods is tested on simulated and real data.