Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

Abstract

As reinforcement learning techniques are increasingly applied to real-worlddecision problems, attention has turned to how these algorithms use potentiallysensitive information. We consider the task of training a policy that maximizesreward while minimizing disclosure of certain sensitive state variables throughthe actions. We give examples of how this setting covers real-world problems inprivacy for sequential decision-making. We solve this problem in the policygradients framework by introducing a regularizer based on the mutualinformation (MI) between the sensitive state and the actions. We develop amodel-based stochastic gradient estimator for optimization ofprivacy-constrained policies. We also discuss an alternative MI regularizerthat serves as an upper bound to our main MI regularizer and can be optimizedin a model-free setting, and a powerful direct estimator that can be used in anenvironment with differentiable dynamics. We contrast previous work indifferentially-private RL to our mutual-information formulation of informationdisclosure. Experimental results show that our training method results inpolicies that hide the sensitive state, even in challenging high-dimensionaltasks.

Quick Read (beta)

loading the full paper ...