Tiling is a key technique for data locality optimization and is widely used in high-performance implementations of dense matrix-matrix multiplication for multicore/manycore CPUs and GPUs. However, the irregular and matrix-dependent data access pattern of sparse matrix multiplication makes it challenging to use tiling to enhance data reuse. In this paper, we devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication). In contrast to studies that have resorted to non-standard sparse-matrix representations to enhance performance, we use the standard Compressed Sparse Row (CSR) representation, within which intra-row reordering is performed to enable adaptive tiling. Experimental evaluation using an extensive set of matrices from the Sparse Suite collection demonstrates significant performance improvement over currently available state-of-the-art alternatives.
Tue 19 Feb
|15:45 - 16:10|
Martin KüttlerTU Dresden, Maksym PlanetaTU Dresden, Germany, Jan BierbaumTU Dresden, Carsten WeinholdTU Dresden, Hermann HärtigTU Dresden, Amnon BarakThe Hebrew University of Jerusalem, Torsten HoeflerETH ZurichDOI
|16:10 - 16:35|
Changwan Hong, Aravind Sukumaran-RajamOhio State University, USA, Israt Nisa, Kunal SinghThe Ohio State University, P. SadayappanOhio State UniversityDOI