Tutorial: Implementing Parallel and Concurrent Tree Structures
Recently, the advent and development of shared-memory multi-core machine have improved the computation ability to process in-memory large-scale data in parallel. As such, it is of great interest to have simple and efficient parallel data structures for programmers to easily organize and process data. One of the most important data structures for organizing data is the tree structure. This tutorial will introduce general techniques to support parallelism and concurrency in trees, as well as some state-of-the-art tree structures. Attendees will learn how to write simple and efficient parallel algorithms for trees. In particular, this tutorial will introduce an algorithmic framework for parallel balanced binary trees, which bases all tree algorithms on a single primitive Join. This framework is extendable to at least four balancing schemes: AVL trees, red-black trees, weight-balance trees, and treaps, and all algorithms except Join are generic across balancing schemes. Based on this Join-based framework, this tutorial will address techniques including a wide range of algorithms, concurrency, an augmentation framework, persistence (meaning to yield a new version on updating), multi-versioning, garbage collection and compressed representation. The tree structure is lock-free and all algorithms on trees are work-efficient with poly-logarithmic parallel depth.
This parallel tree framework is integrated into a C++ library called PAM (Parallel Augmented Maps). This tutorial will also discuss how to use the framework of the PAM library to solve real-world problems. The tree structure is extendable to a variety of applications in different domains with ease, which is achieved by using an abstract data type (ADT) called the augmented map. Designed as a general-purpose library for parallel tree structures, this library can be directly applied to many applications including 2D range/segment/rectangle search, inverted index searching, HTAP database systems, multi-version concurrency control, graph processing systems, and so on. Making use of the library as bottom-level implementation, each of the applications only needs about one hundred lines of high-level code to get highly-optimized implementations.
The advantage of the Join-based framework and the library lies in 1) in the generality in algorithm design and 2) the generality for multiple real-world problems. Firstly, the versatility and generality reduce the incremental effort for users to adapt our tree structure simply as a black box to their own applications. Furthermore, this minimizes the coding effort to re-create the tree structure with special requirements when necessary (in another programming language, for example).
This tutorial will also have a hands-on introduction to the download/installation of the library. We will show code examples on how to use the framework and the library. This tutorial will finally show comparisons among different systems and tree structures under different workloads and applications, and show analysis on the favored properties for trees under different scenarios. The library is available at https://github.com/cmuparlay/PAM. There are also related pages on Wikipedia site:
- Augmented maps: https://en.wikipedia.org/wiki/Augmented_map
- The PAM library: https://en.wikipedia.org/wiki/PAM_library
- The join-based algorithms: https://en.wikipedia.org/wiki/Join-based_tree_algorithms
- General techniques
- Algorithms (most of them are also parallel) that are generic across multiple balancing schemes. Examples: insertion, deletion, construction, union, split, multi-insertion, map-reduce, filter, etc.
- Support augmentation on trees.
- The definition of the augmented map as an abstract data type, and several examples of modeling real-world applications using augmented maps.
- Support persistence on trees.
- Multi-version concurrency control on trees.
- Garbage collection techniques on multi-versioned tree structures.
- Compressed representations on tree structures.
- The parallel library PAM.
- The interface and how to write code on top of it.
- Several examples of optimizations.
- Walkthrough on download/installation.
- Code for toy examples.
- Examples of applications utilizing combinations of functionalities.
- 2D range searching problem.
- Dynamic inverted index searching.
- Dynamic graph processing.
- Single-writer concurrency.
- Performance comparison between state-of-the-art tree structures, including concurrent tree structures, tree structures for geometric algorithms, etc.