PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

Efficient inference of deep learning models are challenging and of great value in both academic and industrial community. In this paper, we focus on exploiting the sparsity in input data to improve the performance of deep learning models. We propose an end-to-end optimization pipeline to generate programs for the inference with sparse input. The optimization pipeline contains both domain-specific and general optimization techniques and is capable of generating efficient code without relying on the off-the-shelf libraries. Evaluations show that we achieve significant speedups over the state-of-the-art frameworks and libraries on a real-world application, e.g., $9.8\times$ over TensorFlow and $3.6\times$ over Intel MKL on the detection in autonomous driving.