PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States
Mon 18 Feb 2019 10:55 - 11:20 at Salon 12/13 - Session 2: Heterogeneous Platforms and GPU Chair(s): Xu Liu

Throughput-oriented architectures, such as GPUs, can sustain three orders of magnitude more concurrent threads than multicore architectures. This level of concurrency pushes typical synchronization primitives (e.g., mutexes) over their scalability limits, creating significant performance bottlenecks in modules, such as memory allocators, that use them. In this paper, we develop concurrent programming techniques and synchronization primitives, in support of a dynamic memory allocator, that are efficient for use with very high levels of concurrency.

We formulate resource allocation as a two-stage process, that decouples accounting for the number of available resources from the tracking of the available resources themselves. To facilitate the accounting stage, we introduce a novel bulk semaphore abstraction that extends traditional semaphore semantics by optimizing for the case where threads operate on the semaphore simultaneously. We also similarly design new collective synchronization primitives that enable groups of cooperating threads to enter critical sections together. Finally, we show that delegation of deferred reclamation to threads already blocked greatly improves efficiency.

Using all these techniques, our throughput-oriented memory allocator delivers both high allocation rates and low memory fragmentation on modern GPUs. Our experiments demonstrate that it achieves allocation rates that are on average 16.56 times higher than the counterpart implementation in the CUDA 9 toolkit.

Mon 18 Feb

Displayed time zone: Guadalajara, Mexico City, Monterrey change

10:55 - 12:35
Session 2: Heterogeneous Platforms and GPUMain Conference at Salon 12/13
Chair(s): Xu Liu College of William and Mary
10:55
25m
Talk
Throughput-Oriented GPU Memory Allocation
Main Conference
Isaac Gelado NVIDIA, Michael Garland NVIDIA Research
DOI
11:20
25m
Talk
SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU
Main Conference
Hao Wang The Ohio State University, USA, Liang Geng The Ohio State University, USA, Rubao Lee United Parallel Computing Corporation, USA, Kaixi Hou Virginia Tech, USA, Yanfeng Zhang , Xiaodong Zhang The Ohio State University, USA
DOI
11:45
25m
Talk
Incremental Flattening for Nested Data Parallelism
Main Conference
Troels Henriksen University of Copenhagen, Denmark, Frederik Thorøe DIKU, University of Copenhagen, Martin Elsman University of Copenhagen, Denmark, Cosmin Oancea University of Copenhagen, Denmark
DOI
12:10
25m
Talk
Adaptive Sparse Matrix-Matrix Multiplication on the GPU
Main Conference
Martin Winter Graz University of Technology, Austria, Daniel Mlakar Graz University of Technology, Austria, Rhaleb Zayer Max Planck Institute for Informatics, Hans-Peter Seidel Max Planck Institute for Informatics, Markus Steinberger Graz University of Technology, Austria
DOI