Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Qian Long, Zihan Zhou, Abhinav Gupta, Fei Fang, Yi Wu†, Xiaolong Wang†

Keywords: curriculum learning, fine tuning, multi agent reinforcement learning, reinforcement learning

Thurs Session 1 (05:00-07:00 GMT) [Live QA] [Cal]
Thurs Session 3 (12:00-14:00 GMT) [Live QA] [Cal]

Abstract: In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially. The source code and videos can be found at https://sites.google.com/view/epciclr2020.

Similar Papers

Emergent Tool Use From Multi-Agent Autocurricula
Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch,
Multi-Agent Interactions Modeling with Correlated Policies
Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu,
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha,
On the interaction between supervision and self-play in emergent communication
Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau,