PPoPP 2019
Sat 16 - Wed 20 February 2019 Washington, DC, United States

Communication overhead is a well-known performance bottleneck in distributed Stochastic Gradient Descent (SGD), which is a popular algorithm to perform optimization in large-scale machine learning tasks. In this work, we propose a practical and effective technique, named Adaptive Periodic Parameter Averaging, to reduce the communication overhead of distributed SGD, without impairing its convergence property.