SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski‎

Keywords: distributed, policy gradient, reinforcement learning, scalability

Tues Session 3 (12:00-14:00 GMT) [Live QA] [Cal]
Tues Session 5 (20:00-22:00 GMT) [Live QA] [Cal]
Tuesday: RL and Estimation

Abstract: We present a modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL). By effectively utilizing modern accelerators, we show that it is not only possible to train on millions of frames per second but also to lower the cost. of experiments compared to current methods. We achieve this with a simple architecture that features centralized inference and an optimized communication layer. SEED adopts two state-of-the-art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, DeepMind Lab and Google Research Football. We improve the state of the art on Football and are able to reach state of the art on Atari-57 twice as fast in wall-time. For the scenarios we consider, a 40% to 80% cost reduction for running experiments is achieved. The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out.

Similar Papers

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra,
Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations
Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, Zhiru Zhang,
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica,
Batch-shaping for learning conditional channel gated networks
Babak Ehteshami Bejnordi, Tijmen Blankevoort, Max Welling,