Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards

Allan Zhou; Eric Jang; Daniel Kappler; Alex Herzog; Mohi Khansari; Paul Wohlhart; Yunfei Bai; Mrinal Kalakrishnan; Sergey Levine; Chelsea Finn

Abstract: Imitation learning allows agents to learn complex behaviors from demonstrations. However, learning a complex vision-based task may require an impractical number of demonstrations. Meta-imitation learning is a promising approach towards enabling agents to learn a new task from one or a few demonstrations by leveraging experience from learning similar tasks. In the presence of task ambiguity or unobserved dynamics, demonstrations alone may not provide enough information; an agent must also try the task to successfully infer a policy. In this work, we propose a method that can learn to learn from both demonstrations and trial-and-error experience with sparse reward feedback. In comparison to meta-imitation, this approach enables the agent to effectively and efficiently improve itself autonomously beyond the demonstration data. In comparison to meta-reinforcement learning, we can scale to substantially broader distributions of tasks, as the demonstration reduces the burden of exploration. Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks.

Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards

Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn

Similar Papers

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine,

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber,

State Alignment-based Imitation Learning

Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su,

State-only Imitation with Transition Dynamics Mismatch

Tanmay Gangwani, Jian Peng,