In the ongoing efforts targeting the vectorization of linear algebra primitives, sparse matrix-matrix multiplication (SpGEMM) has received considerably much less attention than sparse Matrix Vector multiplication (SpMV). While both are equally important, this disparity can be attributed mainly to the additional formidable challenges raised by SpGEMM.
In this paper, we present a dynamic approach for addressing SpGEMM on the GPU. Our approach works directly on the standard compressed sparse rows (CSR) data format. In comparison to previous SpGEMM implementations, our approach guarantees a homogeneous, load-balanced access pattern to the first input matrix and improves memory access to the second input matrix. It adaptively repurposes GPU threads during execution and maximizes the time efficient on-chip scratchpad memory can be used. Following a completely deterministic scheduling pattern, it guaranties bit-stable results during repetitive execution, a property missing from other approaches. Evaluation on an extensive sparse matrix benchmark suggest our approach being the fastest SpGEMM implementation for highly sparse matrices and when seeking bit-stable results across the entire test set.
Mon 18 Feb
|10:55 - 11:20|
|11:20 - 11:45|
Hao WangThe Ohio State University, USA, Liang GengThe Ohio State University, USA, Rubao LeeUnited Parallel Computing Corporation, USA, Kaixi HouVirginia Tech, USA, Yanfeng Zhang, Xiaodong ZhangThe Ohio State University, USADOI
|11:45 - 12:10|
Troels HenriksenUniversity of Copenhagen, Denmark, Frederik ThorøeDIKU, University of Copenhagen, Martin ElsmanUniversity of Copenhagen, Denmark, Cosmin OanceaUniversity of Copenhagen, DenmarkDOI
|12:10 - 12:35|
Martin WinterGraz University of Technology, Austria, Daniel MlakarGraz University of Technology, Austria, Rhaleb ZayerMax Planck Institute for Informatics, Hans-Peter SeidelMax Planck Institute for Informatics, Markus SteinbergerGraz University of Technology, AustriaDOI