Top-K query retrieval of combinations with sum-of-subsets ranking

Majumder, Subhashis; Sanyal, Biswajit; Gupta, Prosenjit; Sinha, Soumik; Pande, Shiladitya; Hon, Wing Kai

doi:10.1007/978-3-319-12691-3_36

Top-K query retrieval of combinations with sum-of-subsets ranking

Date Issued

01-01-2014

Author(s)

Majumder, Subhashis

Sanyal, Biswajit

Gupta, Prosenjit

Sinha, Soumik

Pande, Shiladitya

Hon, Wing Kai

DOI

10.1007/978-3-319-12691-3_36

Abstract

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries return a ranked set of the k best data objects selected on the basis of the ranks (scores) of the objects, assigned by any ranking (scoring) function. In this paper, we consider a problem on generation of combinations. Given a set S of n real numbers and an integer r ≤ n, we consider the (Formula Presented) different r-combinations of the elements of S. Let all these (Formula Presented) combinations be indicated by the set (Formula Presented). From this set C, given any positive integer (Formula Presented) our goal is to generate the k best combinations (top-k combinations) ranked on the basis of some aggregation function F. We consider Summation as the aggregation function. This calculates the sum of all the r real numbers in any combination Ci, and the ranking criterion is that a combination Ci is ranked higher than a combination Cj, if the sum of the constituent numbers of Ci is larger than that of Cj. For any given n and r (≤ n), we build a metadata structure G (basically a DAG), even before S and k are known. We can later use G to report the top-k combinations efficiently when S is available. We further present an alternative incremental method, where we generate only the required portions of G on demand, instead of constructing the whole G explicitly. It helps us to save the time and space overhead of the preprocessing phase.

Volume

8881

Options

Top-K query retrieval of combinations with sum-of-subsets ranking