Meta-Q-Learning

Rasool Fakoor; Pratik Chaudhari; Stefano Soatto; Alexander J. Smola

Meta-Q-Learning

Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

Keywords: meta reinforcement learning, off policy, reinforcement learning

Abstract Paper Reviews Chat

Mon Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Mon Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Monday: Meta-learning

Abstract: This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with the state of the art in meta-RL.

Meta-Q-Learning

Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

Similar Papers

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber,

Meta-Learning without Memorization

Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn,

A Theoretical Analysis of the Number of Shots in Few-Shot Learning

Tianshi Cao, Marc T Law, Sanja Fidler,

Automated Relational Meta-learning

Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Ruirui Li, Zhenhui Li,