Return-Aligned Decision Transformer

Abstract

Traditional approaches in offline reinforcement learning aim to learn theoptimal policy that maximizes the cumulative reward, also known as return.However, as applications broaden, it becomes increasingly crucial to trainagents that not only maximize the returns, but align the actual return with aspecified target return, giving control over the agent's performance. DecisionTransformer (DT) optimizes a policy that generates actions conditioned on thetarget return through supervised learning and is equipped with a mechanism tocontrol the agent using the target return. Despite being designed to align theactual return with the target return, we have empirically identified adiscrepancy between the actual return and the target return in DT. In thispaper, we propose Return-Aligned Decision Transformer (RADT), designed toeffectively align the actual return with the target return. Our model decouplesreturns from the conventional input sequence, which typically consists ofreturns, states, and actions, to enhance the relationships between returns andstates, as well as returns and actions. Extensive experiments show that RADTreduces the discrepancies between the actual return and the target return ofDT-based methods.

Quick Read (beta)

loading the full paper ...