Cluster analysis of microbiome data via mixtures of Dirichlet-multinomial regression models
The human gut microbiome is a source of genetic and metabolic diversity, and exploring the relationship between biological/environmental covariates and the resulting taxonomic composition of the gut microbial community is an active area of research. Previously, a Dirichlet-multinomial regression framework has been suggested to model this relationship, but it did not account for any latent group structure which has been observed across microbiome samples which share similar biota compositions (known as enterotypes). Here, a finite mixture of Dirichlet-multinomial regression models is proposed and illustrated in order to account for the enterotype structure and allow for a probabilistic investigation of the relationship between bacterial abundance and biological/environmental covariates within each inferred enterotype. Furthermore, finite mixtures of regression models which incorporate the concomitant effect of the covariates on the resulting mixing proportions are also proposed and examined within the Dirichlet-multinomial framework.