PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

Asynchronous task-based programming models are gaining popularity to address programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, inefficient interactions between such programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computation-communication overlap and negatively impact the performance of parallel programs. We propose to expose information about MPI internals to a task-based runtime system to make better scheduling decisions. In particular, we show how existing mechanisms used to profile MPI implementations can be used to share information between MPI and a task-based runtime. Further, an evaluation of the proposed method shows performance improvements of up to 30.7% for applications with collective communication.