PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

Recursive divide-&-conquer algorithms are known for solving dynamic programming (DP) problems efficiently on shared-memory multicore machines. In this work, we extend them to run efficiently also on manycore GPUs and distributed-memory machines without changing their basic structure.

Our GPU algorithms work efficiently even when the data is too large to fit into the host RAM. These are external-memory algorithms based on recursive $r$-way divide and conquer, where r (>= 2) varies based on the current depth of the recursion. Our distributed-memory algorithms are also based on multi-way recursive divide and conquer that extends naturally inside each shared-memory multicore/manycore compute node. We show that these algorithms are work-optimal and have low latency and bandwidth bounds.

We also report empirical results for our algorithms.