Options
Factorized Graph Representations for Semi-Supervised Learning from Sparse Data
Date Issued
14-06-2020
Author(s)
Kumar, Krishna P.
Langton, Paul
Gatterbauer, Wolfgang
Abstract
Node classification is an important problem in graph data management. It is commonly solved by various label propagation methods that iteratively pass messages along edges, starting from a few labeled seed nodes. For graphs with arbitrary compatibilities between classes, these methods crucially depend on knowing the compatibility matrix, which must thus be provided by either domain experts or heuristics. We instead suggest a principled and scalable method for directly estimating the compatibilities from a sparsely labeled graph. This method, which we call distant compatibility estimation, works even on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of the time it later takes to label the remaining nodes. Our approach first creates multiple factorized graph representations (with size independent of the graph) and then performs estimation on these smaller graph sketches. We refer to algebraic amplification as the underlying idea of leveraging algebraic properties of an algorithm's update equations to amplify sparse signals in data. We show that our estimator is by orders of magnitude faster than alternative approaches and that the end-to-end classification accuracy is comparable to using gold standard compatibilities. This makes it a cheap pre-processing step for any existing label propagation method and removes the current dependence on heuristics.