Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

Abstract

Reinforcement learning (RL) presents a promising framework to learn policiesthrough environment interaction, but often requires an infeasible amount ofinteraction data to solve complex tasks from sparse rewards. One directionincludes augmenting RL with offline data demonstrating desired tasks, but pastwork often require a lot of high-quality demonstration data that is difficultto obtain, especially for domains such as robotics. Our approach consists of areverse curriculum followed by a forward curriculum. Unique to our approachcompared to past work is the ability to efficiently leverage more than onedemonstration via a per-demonstration reverse curriculum generated via stateresets. The result of our reverse curriculum is an initial policy that performswell on a narrow initial state distribution and helps overcome difficultexploration problems. A forward curriculum is then used to accelerate thetraining of the initial policy to perform well on the full initial statedistribution of the task and improve demonstration and sample efficiency. Weshow how the combination of a reverse curriculum and forward curriculum in ourmethod, RFCL, enables significant improvements in demonstration and sampleefficiency compared against various state-of-the-artlearning-from-demonstration baselines, even solving previously unsolvable tasksthat require high precision and control.

Quick Read (beta)

loading the full paper ...