Options
A phrase-based text representation approach for effective retrieval of web documents
Date Issued
01-12-2003
Author(s)
Sharma, Rupali
Raman, S.
Abstract
Internet has facilitated the capability of searching documents on the web irrespective of their physical location. Since most of the documents available on the web are machine-readable but not machine-understandable, the retrieval of relevant information continues to be a difficult task. Essentially, effective text retrieval is an issue related to an efficient text representation also. This paper presents and discusses a phrase-based model for text representation that uses rule-based Natural Language Processing (NLP) techniques for extracting key-phrases from the text document by a process of partial parsing. NLP technique has been used to preprocess the documents to extract the content carrying terms, and also to process a user's request to identify important search terms. Phrasal indexing aims to reduce the ambiguity inherent in words considered in isolation, and thus attempts to improve the retrieval effectiveness. The representation approach has been evaluated from retrieval point of view, and the system shows a performance of 81% precision and 83% of recall.
Volume
2