Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

Abstract

Offline reinforcement learning learns from a static dataset withoutinteracting with the environment, which ensures security and thus owns a goodprospect of application. However, directly applying naive reinforcementlearning methods usually fails in an offline environment due to functionapproximation errors caused by out-of-distribution(OOD) actions. To solve thisproblem, existing algorithms mainly penalize the Q-value of OOD actions, thequality of whose constraints also matter. Imprecise constraints may lead tosuboptimal solutions, while precise constraints require significantcomputational costs. In this paper, we propose a novel count-based method forcontinuous domains, called Grid-Mapping Pseudo-Count method(GPC), to penalizethe Q-value appropriately and reduce the computational cost. The proposedmethod maps the state and action space to discrete space and constrains theirQ-values through the pseudo-count. It is theoretically proved that only a fewconditions are needed to obtain accurate uncertainty constraints in theproposed method. Moreover, we develop a Grid-Mapping Pseudo-Count SoftActor-Critic(GPC-SAC) algorithm using GPC under the Soft Actor-Critic(SAC)framework to demonstrate the effectiveness of GPC. The experimental results onD4RL benchmark datasets show that GPC-SAC has better performance and lesscomputational cost compared to other algorithms.

Quick Read (beta)

loading the full paper ...