Double Neural Counterfactual Regret Minimization

Hui Li; Kailiang Hu; Shaohua Zhang; Yuan Qi; Le Song

Double Neural Counterfactual Regret Minimization

Hui Li, Kailiang Hu, Shaohua Zhang, Yuan Qi, Le Song

Keywords: memory, optimization, reinforcement learning

Abstract Paper Reviews Chat

Mon Session 1 (05:00-07:00 GMT) [Live QA] [Cal]

Mon Session 2 (08:00-10:00 GMT) [Live QA] [Cal]

Abstract: Counterfactual regret minimization (CFR) is a fundamental and effective technique for solving Imperfect Information Games (IIG). However, the original CFR algorithm only works for discrete states and action spaces, and the resulting strategy is maintained as a tabular representation. Such tabular representation limits the method from being directly applied to large games. In this paper, we propose a double neural representation for the IIGs, where one neural network represents the cumulative regret, and the other represents the average strategy. Such neural representations allow us to avoid manual game abstraction and carry out end-to-end optimization. To make the learning efficient, we also developed several novel techniques including a robust sampling method and a mini-batch Monte Carlo Counterfactual Regret Minimization (MCCFR) method, which may be of independent interests. Empirically, on games tractable to tabular approaches, neural strategies trained with our algorithm converge comparably to their tabular counterparts, and significantly outperform those based on deep reinforcement learning. On extremely large games with billions of decision nodes, our approach achieved strong performance while using hundreds of times less memory than the tabular CFR. On head-to-head matches of hands-up no-limit texas hold'em, our neural agent beat the strong agent ABS-CFR by $9.8\pm4.1$ chips per game. It's a successful application of neural CFR in large games.

Double Neural Counterfactual Regret Minimization

Hui Li, Kailiang Hu, Shaohua Zhang, Yuan Qi, Le Song

Similar Papers

Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information

Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu,

Model Based Reinforcement Learning for Atari

Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski,

On Bonus Based Exploration Methods In The Arcade Learning Environment

Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare,

A Generalized Training Approach for Multiagent Learning

Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos,