CuLDA_CGS: Solving Large-scale LDA Problems on GPUs (PPoPP 2019 - Posters)

Sat 16 - Wed 20 February 2019 Washington, DC, United States

Who

Xiaolong Xie, Eric Liang, Xiuhong Li, Wei Tan

Track

PPoPP 2019 Posters

Abstract

Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fact that the input corpus of LDA algorithms consists of millions to billions of tokens, the LDA training process is very time-consuming, which prevents the adoption of LDA in many scenarios, e.g., online service. GPUs have benefited modern machine learning algorithms and big data analysis as they can provide high memory bandwidth and computation power. Therefore, many frameworks, e.g. TensorFlow, Caffe, CNTK, support to use GPUs for accelerating various machine learning data-intensive algorithms. However, we observe that the existing LDA solutions on GPUs are not satisfying.

In this paper, we present CuLDA_CGS, a GPU-based efficient and scalable approach to accelerate large-scale LDA problems. CuLDA_CGS is designed to efficiently solve LDA problems at high throughput. To it, we first delicately design workload partition and synchronization mechanism to exploit multiple GPUs. Then, we offload the LDA sampling process to each individual GPU by optimizing from the sampling algorithm, parallelization, and data compression perspectives. Experiment evaluations show that compared with the state-of-the-art LDA solutions, CuLDA_CGS outperforms them by a large margin (up to 7.3X) on a single GPU. CuLDA_CGS is able to achieve extra 3.0X speedup on 4 GPUs. To the best of our knowledge, CuLDA_CGS is the first LDA solution that is scalable to multiple GPUs.

Xiaolong Xie

Peking University

Eric Liang

Peking University

Xiuhong Li

Peking University

Wei Tan

Citadel LLC

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

Session Program

Sun 17 Feb
Displayed time zone: Guadalajara, Mexico City, Monterrey change

	18:00 - 20:00	Welcome Reception and Poster SessionMain Conference at Mezzanine Foyer