Options
Latent dirichlet allocation based multi-document summarization
Date Issued
16-12-2008
Author(s)
Arora, Rachit
Ravindran, Balaraman
Abstract
Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a, summary. In this article we use Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events. Our approach is distinguished from existing approaches in that we use mixture models to capture the topics and pick up the sentences without paying attention to the details of grammar and structure of the documents. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures. Copyright 2008 ACM.