Options
Complexity guided noise filtering in QA repositories
Date Issued
01-01-2017
Author(s)
Abstract
Filtering out noisy sentences of an answer which are irrelevant to the question being asked increases the utility and reuse of a Question-Answer (QA) repository. Filtering such sentences might be difficult for traditional supervised classification methods due to the extensive labelling efforts involved. In this paper, we propose a semi-supervised learning approach, where we first infer a set of topics on the corpus using Latent Dirichlet Allocation (LDA). We label the topics automatically using a small labelled set and use them for classifying an unseen sentence as useful or noisy. We performed the experiments on a real-life help desk dataset and find that the results are comparable to other methods in semisupervised learning.