Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy

Abstract

Reinforcement learning agents are susceptible to evasion attacks duringdeployment. In single-agent environments, these attacks can occur throughimperceptible perturbations injected into the inputs of the victim policynetwork. In multi-agent environments, an attacker can manipulate an adversarialopponent to influence the victim policy's observations indirectly. Whileadversarial policies offer a promising technique to craft such attacks, currentmethods are either sample-inefficient due to poor exploration strategies orrequire extra surrogate model training under the black-box assumption. Toaddress these challenges, in this paper, we propose Intrinsically MotivatedAdversarial Policy (IMAP) for efficient black-box adversarial policy learningin both single- and multi-agent environments. We formulate four types ofadversarial intrinsic regularizers -- maximizing the adversarial statecoverage, policy coverage, risk, or divergence -- to discover potentialvulnerabilities of the victim policy in a principled way. We also present anovel bias-reduction method to balance the extrinsic objective and theadversarial intrinsic regularizers adaptively. Our experiments validate theeffectiveness of the four types of adversarial intrinsic regularizers and thebias-reduction method in enhancing black-box adversarial policy learning acrossa variety of environments. Our IMAP successfully evades two types of defensemethods, adversarial training and robust regularizer, decreasing theperformance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% acrossfour single-agent tasks. IMAP also achieves a state-of-the-art attackingsuccess rate of 83.91\% in the multi-agent game YouShallNotPass. Our code isavailable at \url{https://github.com/x-zheng16/IMAP}.

Quick Read (beta)

loading the full paper ...