Over the past decade, many programming languages and systems for parallel-computing have been developed, e.g., Fork/Join and Habanero Java, Parallel Haskell, Parallel ML, and X10. Although these systems raise the level of abstraction for writing parallel codes, performance continues to require labor-intensive optimizations for coarsening the granularity of parallel executions. In this paper, we present provably and practically efficient techniques for controlling granularity within the run-time system of the language. Our starting point is “oracle-guided scheduling”, a result from the functional-programming community that shows that granularity can be controlled by an “oracle” that can predict the execution time of parallel codes. We give an algorithm for implementing such an oracle and prove that it has the desired theoretical properties under the nested-parallel programming model. We implement the oracle in C++ by extending Cilk and evaluate its practical performance. The results show that our techniques can essentially eliminate hand tuning while closely matching the performance of hand tuned codes.
Tue 19 FebDisplayed time zone: Guadalajara, Mexico City, Monterrey change
10:55 - 12:35 | Session 6, Best Paper CandidatesMain Conference at Salon 12/13 Chair(s): Rudolf Eigenmann University of Delaware | ||
10:55 25mTalk | Lightweight Hardware Transactional Memory Profiling Main Conference Qingsen Wang College of William and Mary, Pengfei Su College of William and Mary, Milind Chabbi Uber Technologies, Xu Liu College of William and Mary DOI | ||
11:20 25mTalk | A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs Main Conference Ke Meng , Jiajia Li Georgia Institute of Technology, Pacific Northwest National Laboratory, Guangming Tan Chinese Academy of Sciences(CAS), Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences DOI | ||
11:45 25mTalk | Provably and Practically Efficient Granularity Control Main Conference Umut A. Acar Carnegie Mellon University, Vitaly Aksenov Inria & ITMO University, Arthur Charguéraud Inria, Mike Rainey Indiana University, USA DOI | ||
12:10 25mTalk | A Coordinated Tiling and Batching Framework for Efficient GEMM on GPUs Main Conference Xiuhong Li Peking University, Eric Liang Peking University, Shengen Yan SenseTime, Jia Liancheng Peking University, Yinghan Li SenseTime DOI |