The Power of Resets in Online Reinforcement Learning

Abstract

Simulators are a pervasive tool in reinforcement learning, but most existingalgorithms cannot efficiently exploit simulator access -- particularly inhigh-dimensional domains that require general function approximation. Weexplore the power of simulators through online reinforcement learning with{local simulator access} (or, local planning), an RL protocol where the agentis allowed to reset to previously observed states and follow their dynamicsduring training. We use local simulator access to unlock new statisticalguarantees that were previously out of reach: - We show that MDPs with low coverability Xie et al. 2023 -- a generalstructural condition that subsumes Block MDPs and Low-Rank MDPs -- can belearned in a sample-efficient fashion with only $Q^{\star}$-realizability(realizability of the optimal state-value function); existing online RLalgorithms require significantly stronger representation conditions. - As a consequence, we show that the notorious Exogenous Block MDP problemEfroni et al. 2022 is tractable under local simulator access. The results above are achieved through a computationally inefficientalgorithm. We complement them with a more computationally efficient algorithm,RVFS (Recursive Value Function Search), which achieves provable samplecomplexity guarantees under a strengthened statistical assumption known aspushforward coverability. RVFS can be viewed as a principled, provablecounterpart to a successful empirical paradigm that combines recursive search(e.g., MCTS) with value function approximation.

Quick Read (beta)

loading the full paper ...