Learning Efficient Parameter Server Synchronization Policies for Distributed SGD

Rong Zhu, Sheng Yang, Andreas Pfadler, Zhengping Qian, Jingren Zhou

Keywords: distributed, gradient descent, reinforcement learning

Mon Session 1 (05:00-07:00 GMT) [Live QA] [Cal]
Mon Session 2 (08:00-10:00 GMT) [Live QA] [Cal]

Abstract: We apply a reinforcement learning (RL) based approach to learning optimal synchronization policies used for Parameter Server-based distributed training of machine learning models with Stochastic Gradient Descent (SGD). Utilizing a formal synchronization policy description in the PS-setting, we are able to derive a suitable and compact description of states and actions, allowing us to efficiently use the standard off-the-shelf deep Q-learning algorithm. As a result, we are able to learn synchronization policies which generalize to different cluster environments, different training datasets and small model variations and (most importantly) lead to considerable decreases in training time when compared to standard policies such as bulk synchronous parallel (BSP), asynchronous parallel (ASP), or stale synchronous parallel (SSP). To support our claims we present extensive numerical results obtained from experiments performed in simulated cluster environments. In our experiments training time is reduced by 44 on average and learned policies generalize to multiple unseen circumstances.

Similar Papers

Synthesizing Programmatic Policies that Inductively Generalize
Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama,
Never Give Up: Learning Directed Exploration Strategies
Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martin Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell,
Composing Task-Agnostic Policies with Deep Reinforcement Learning
Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip,