DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

Abstract

The success of many RL techniques heavily relies on human-engineered denserewards, which typically demand substantial domain expertise and extensivetrial and error. In our work, we propose DrS (Dense reward learning fromStages), a novel approach for learning reusable dense rewards for multi-stagetasks in a data-driven manner. By leveraging the stage structures of the task,DrS learns a high-quality dense reward from sparse rewards and demonstrationsif given. The learned rewards can be \textit{reused} in unseen tasks, thusreducing the human effort for reward engineering. Extensive experiments onthree physical robot manipulation task families with 1000+ task variantsdemonstrate that our learned rewards can be reused in unseen tasks, resultingin improved performance and sample efficiency of RL algorithms. The learnedrewards even achieve comparable performance to human-engineered rewards on sometasks. See our project page (https://sites.google.com/view/iclr24drs) for moredetails.

Quick Read (beta)

loading the full paper ...