Harmonia: A High Throughput B+tree for GPUs (PPoPP 2019 - Main Conference)

Who

Zhaofeng Yan, Yuzhe Lin, Lu Peng, Weihua Zhang

Track

PPoPP 2019 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Cancun.

Use conference time zone: (GMT-05:00) CancunSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 18 Feb 2019 16:10 - 16:35 at Salon 12/13 - Session 4: GPU B-Trees Chair(s): Ang Li

Abstract

B+tree is one of the most important data structures and has been widely used in different fields. With the increase of concurrent queries and data-scale in storage, designing an efficient B+tree structure has become critical. Due to abundant computation resources, GPUs provide potential opportunities to achieve high query throughput for B+tree. However, prior methods cannot achieve satisfactory performance results due to low resource utilization and poor memory performance.

In this paper, we first identify the gaps between B+tree and GPUs. Concurrent B+tree queries involve many global memory accesses and different divergences, which mismatch with GPU features. Based on this observation, we propose Harmonia, a novel B+tree structure to bridge the gap. In Harmonia, a B+tree structure is divided into a key region and a child region. The key region stores the nodes with its keys in a breadth-first order. The child region is organized as a prefix-sum array, which only stores each node’s first child index in the key region. Since the prefix-sum child region is small and the children’s index can be retrieved through index computations, most of it can be stored in on-chip caches, which can achieve good cache locality. To make it more efficient, Harmonia also includes two optimizations: partially-sorted aggregation and narrowed thread-group traversal, which can mitigate memory and warp divergence and improve resource utilization. Evaluations on a TITAN V GPU show that Harmonia can achieve up to 3.6 billion queries per second, which is about 3.4X faster than that of HB+Tree, a recent state-of-the-art GPU solution.

DOI

https://doi.org/10.1145/3293883.3295704

Zhaofeng Yan

Fudan University

Yuzhe Lin

Fudan University

Lu Peng

Weihua Zhang

Fudan University

Time Zone

The program is currently displayed in (GMT-05:00) Cancun.

Use conference time zone: (GMT-05:00) CancunSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 18 Feb
Displayed time zone: Cancun change

16:10 - 17:00	Session 4: GPU B-TreesMain Conference at Salon 12/13 Chair(s): Ang Li Pacific Northwest National Laboratory

16:10 25m Talk		Harmonia: A High Throughput B+tree for GPUs Main Conference Zhaofeng Yan Fudan University, Yuzhe Lin Fudan University, Lu Peng , Weihua Zhang Fudan University DOI
16:35 25m Talk		Engineering a High-Performance GPU B-Tree Main Conference Muhammad Awad , Saman Ashkiani University of California, Davis, Rob Johnson VMWare Research, Martin Farach-Colton Rutgers University, John D. Owens University of California, Davis DOI