PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

In this paper, we investigate the effectiveness of multiprocessor architectures with ISA-different cores for executing HPC workloads. Our envisioned design point in the heterogeneous architecture space is one with multiple cache-coherency domains, with each domain hosting cores of a different ISA and no coherency between domains. We prototype such an architecture using an Intel Xeon x86-64 server and a Cavium ThunderX ARMv8 server, interconnected using a high-speed network fabric. We design, implement, and evaluate policies for scheduling HPC applications with the goal of maximizing workload makespan. Our results reveal that such an architecture is most effective for workloads that exhibit diverse execution times on ISA-different CPUs, with gains exceeding 60% over ISA-homogeneous architectures. Furthermore, cross-ISA execution migration can yield gains up to 38%.