Main content

Conditional Replicated Softmax for Topic Modelling with Metadata

Show full item record

Title: Conditional Replicated Softmax for Topic Modelling with Metadata
Author: Austria, Charles
Department: Department of Mathematics and Statistics
Program: Mathematics and Statistics
Advisor: Ali, Ayesha
Abstract: Topic models are popular tools that model documents with the goal of extracting semantic topics from text. Documents often come with metadata such as authors, dates, or publication venues; however, current, state-of-the-art topic models do not incorporate metadata. This thesis introduces the conditional replicated softmax model, which is an undirected graphical model that models document word counts and document specific metadata using restricted Boltzmann machines. An additional input layer that is associated with the metadata is added to the replicated softmax model, thereby making the states of the hidden units conditional upon the metadata. This thesis compares the conditional replicated softmax model to other state-of-the-art topic models on the NIPS conference proceedings from 1987 to 1999. The learned topics appear richer and more interpretable relative to Dirichlet multinomial regression, but comparable to replicated softmax. Regardless, the added complexity of the new model was associated with higher test perplexity, which scores their ability to predict unseen documents from a test set, and higher penalized perplexity which penalizes perplexity for model complexity.
URI: http://hdl.handle.net/10214/17481
Date: 2019-09-18
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View
Austria_Charles_201909_Msc.pdf 541.4Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record