A GPU Memory Efficient Speed-up Scheme for Training Ultra-deep Neural Networks
Ultra-deep neural network(UDNN) tends to yield higher-quality model but its training process is often difficult to handle. Scarce GPU DRAM capacity is the primary bottleneck that limits the depth of neural network and the range of trainable minibatch size. In this paper, we present a scheme that dedicates to make the utmost use of finite GPU memory resource to speed up the training process for UDNN. Firstly, a performance-model guided dynamic swap out/in strategy between GPU and host memory is carefully orchestrated to tackle the out-of-memory problem without introducing performance penalty. Then, a hyperparameter (minibatch size, learning rate) tuning policy is designed to explore the optimal configuration after applying the swap strategy from the perspectives of training time and final accuracy simultaneously. Finally, we verify the effectiveness of our scheme in both single and distributed GPU mode.