Conditional Replicated Softmax for Topic Modelling with Metadata

Date

2019-09-20

Authors

Austria, Charles

Journal Title

Journal ISSN

Volume Title

Publisher

University of Guelph

Abstract

Topic models are popular tools that model documents with the goal of extracting semantic topics from text. Documents often come with metadata such as authors, dates, or publication venues; however, current, state-of-the-art topic models do not incorporate metadata. This thesis introduces the conditional replicated softmax model, which is an undirected graphical model that models document word counts and document specific metadata using restricted Boltzmann machines. An additional input layer that is associated with the metadata is added to the replicated softmax model, thereby making the states of the hidden units conditional upon the metadata. This thesis compares the conditional replicated softmax model to other state-of-the-art topic models on the NIPS conference proceedings from 1987 to 1999. The learned topics appear richer and more interpretable relative to Dirichlet multinomial regression, but comparable to replicated softmax. Regardless, the added complexity of the new model was associated with higher test perplexity, which scores their ability to predict unseen documents from a test set, and higher penalized perplexity which penalizes perplexity for model complexity.

Description

Keywords

conditional replicated softmax model, metadata, topic modelling

Citation