PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

GPUs are widely used to accelerate deep learning with neural networks. On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large neural networks on GPU. To compute neural networks exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed a neural network requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 % on x86 machine and 28 % on POWER9 machine.