Options
Document classification by topic labeling
Date Issued
02-09-2013
Author(s)
Hingmire, Swapnil
Chougule, Sandeep
Palshikar, Girish K.
Indian Institute of Technology, Madras
Abstract
In this paper, we propose Latent Dirichlet Allocation (LDA) [1] based document classification algorithm which does not require any labeled dataset. In our algorithm, we construct a topic model using LDA, assign one topic to one of the class labels, aggregate all the same class label topics into a single topic using the aggregation property of the Dirichlet distribution and then automatically assign a class label to each unlabeled document depending on its "closeness" to one of the aggregated topics. We present an extension to our algorithm based on the combination of Expectation-Maximization (EM) algorithm and a naive Bayes classifier. We show effectiveness of our algorithm on three real world datasets. Copyright © 2013 ACM.