Compilation techniques for nested-parallel applications that can adapt to hardware and dataset characteristics are vital for unlocking the power of modern hardware. This paper proposes such a technique, which builds on flattening and is applied in the context of a functional data-parallel language. Our solution uses the degree of utilized parallelism as the driver for generating a multitude of code versions, which together cover all possible mappings of the application’s regular nested parallelism to the levels of parallelism supported by the hardware. These code versions are then combined into one program by guarding them with predicates, whose threshold values are automatically tuned to hardware and dataset characteristics.
Our unsupervised method—of statically clustering datasets to code versions—is different from autotuning work that typically searches for the combination of code transformations producing a single version, best suited for a specific dataset or on average for all datasets.
We demonstrate—by fully integrating our technique in the repertoire of a compiler for the Futhark programming language—significant performance gains on two GPUs for three real-world applications, from the financial domain, and for six Rodinia benchmarks.
Mon 18 FebDisplayed time zone: Guadalajara, Mexico City, Monterrey change
10:55 - 12:35 | Session 2: Heterogeneous Platforms and GPUMain Conference at Salon 12/13 Chair(s): Xu Liu College of William and Mary | ||
10:55 25mTalk | Throughput-Oriented GPU Memory Allocation Main Conference DOI | ||
11:20 25mTalk | SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU Main Conference Hao Wang The Ohio State University, USA, Liang Geng The Ohio State University, USA, Rubao Lee United Parallel Computing Corporation, USA, Kaixi Hou Virginia Tech, USA, Yanfeng Zhang , Xiaodong Zhang The Ohio State University, USA DOI | ||
11:45 25mTalk | Incremental Flattening for Nested Data Parallelism Main Conference Troels Henriksen University of Copenhagen, Denmark, Frederik Thorøe DIKU, University of Copenhagen, Martin Elsman University of Copenhagen, Denmark, Cosmin Oancea University of Copenhagen, Denmark DOI | ||
12:10 25mTalk | Adaptive Sparse Matrix-Matrix Multiplication on the GPU Main Conference Martin Winter Graz University of Technology, Austria, Daniel Mlakar Graz University of Technology, Austria, Rhaleb Zayer Max Planck Institute for Informatics, Hans-Peter Seidel Max Planck Institute for Informatics, Markus Steinberger Graz University of Technology, Austria DOI |