Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies

Abstract

Reinforcement learning policies are typically represented by black-box neuralnetworks, which are non-interpretable and not well-suited for safety-criticaldomains. To address both of these issues, we propose constrained normalizingflow policies as interpretable and safe-by-construction policy models. Weachieve safety for reinforcement learning problems with instantaneous safetyconstraints, for which we can exploit domain knowledge by analyticallyconstructing a normalizing flow that ensures constraint satisfaction. Thenormalizing flow corresponds to an interpretable sequence of transformations onaction samples, each ensuring alignment with respect to a particularconstraint. Our experiments reveal benefits beyond interpretability in aneasier learning objective and maintained constraint satisfaction throughout theentire learning process. Our approach leverages constraints over rewardengineering while offering enhanced interpretability, safety, and direct meansof providing domain knowledge to the agent without relying on complex rewardfunctions.

Quick Read (beta)

loading the full paper ...