Options
GPU centric extensions for parallel strongly connected components computation
Date Issued
12-03-2016
Author(s)
Abstract
Finding Strongly Connected Components (SCC) of a directed graph is a fundamental graph problem. Many of the state-of-the-art sequential algorithms use depth-first search (DFS) to find SCCs. Since, in general DFS is hard to parallelize, researchers rely on other approaches such as Forward-Backward-Trim and Coloring. In this work, we have extended the two state-of-the-art multicore parallel algorithms to take advantage of the highly powerful GPU devices. Unfortunately, state-of-the-art parallel implementations have three performance limitations: (i) selection of ine ective pivots for running multiple forward-backward, (ii) load imbalance across threads while computing reachability closures, and (iii) serializability bottleneck while computing forward- backward sets. We address the first limitation using an improved pivot selection scheme. This improves the chances of finding the largest SCC in real-world graphs. We handle the other two limitations by adding parallelism during closure computation. This improves load balance across warpthreads and reduces thread-serialization. In e ect, the resultant codes perform much better compared to the state-of-the-art. The evaluation results show that our methods achieve up to 8 × speedup over Tarjan's sequential algorithm and up to 2 × speedup over a previous CUDA implementation.