Sub-policy Adaptation for Hierarchical Reinforcement Learning

Alexander Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel

Keywords: hierarchical reinforcement learning, optimization, policy gradient, reinforcement learning, robustness, skill discovery, transfer learning

Tues Session 4 (17:00-19:00 GMT) [Live QA] [Cal]
Tues Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Leaving the skills fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and videos are available at sites.google.com/view/hippo-rl.

Similar Papers

Composing Task-Agnostic Policies with Deep Reinforcement Learning
Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip,
Option Discovery using Deep Skill Chaining
Akhil Bagaria, George Konidaris,
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio,