Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Abstract

Reinforcement Learning (RL) has made significant strides in enablingartificial agents to learn diverse behaviors. However, learning an effectivepolicy often requires a large number of environment interactions. To mitigatesample complexity issues, recent approaches have used high-level taskspecifications, such as Linear Temporal Logic (LTL$_f$) formulas or RewardMachines (RM), to guide the learning progress of the agent. In this work, wepropose a novel approach, called Logical Specifications-guided Dynamic TaskSampling (LSTS), that learns a set of RL policies to guide an agent from aninitial state to a goal state based on a high-level task specification, whileminimizing the number of environmental interactions. Unlike previous work, LSTSdoes not assume information about the environment dynamics or the RewardMachine, and dynamically samples promising tasks that lead to successful goalpolicies. We evaluate LSTS on a gridworld and show that it achieves improvedtime-to-threshold performance on complex sequential decision-making problemscompared to state-of-the-art RM and Automaton-guided RL baselines, such asQ-Learning for Reward Machines and Compositional RL from logical Specifications(DIRL). Moreover, we demonstrate that our method outperforms RM andAutomaton-guided RL baselines in terms of sample-efficiency, both in apartially observable robotic task and in a continuous control roboticmanipulation task.

Quick Read (beta)

loading the full paper ...