PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

GPUs are designed to work well on structured data. Parallelizing graph algorithms on GPUs is challenging due to the irregular memory accesses involved in graph traversals. In particular, three important GPU-specific aspects affect performance: coalescing, memory latency, and thread-divergence. In this work, we tame these challenges by injecting approximations. In particular, we improve coalescing by renumbering and replicating nodes, memory latency by adding edges among specific nodes brought into shared memory, and thread-divergence by normalizing degrees across nodes assigned to a warp. Using a suite of graphs with varied characteristics and five popular algorithms, we demonstrate the effectiveness of our proposed techniques. Our approximations for coalescing, memory latency and thread-divergence lead to mean speedups of 1.3$\times$, 1.41$\times$ and 1.06$\times$ achieving accuracies of 83%, 78% and 84%, respectively.