Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Abstract

To overcome the sim-to-real gap in reinforcement learning (RL), learnedpolicies must maintain robustness against environmental uncertainties. Whilerobust RL has been widely studied in single-agent regimes, in multi-agentenvironments, the problem remains understudied -- despite the fact that theproblems posed by environmental uncertainties are often exacerbated bystrategic interactions. This work focuses on learning in distributionallyrobust Markov games (RMGs), a robust variant of standard Markov games, whereineach agent aims to learn a policy that maximizes its own worst-case performancewhen the deployed environment deviates within its own prescribed uncertaintyset. This results in a set of robust equilibrium strategies for all agents thatalign with classic notions of game-theoretic equilibria. Assuming anon-adaptive sampling mechanism from a generative model, we propose asample-efficient model-based algorithm (DRNVI) with finite-sample complexityguarantees for learning robust variants of various notions of game-theoreticequilibria. We also establish an information-theoretic lower bound for solvingRMGs, which confirms the near-optimal sample complexity of DRNVI with respectto problem-dependent factors such as the size of the state space, the targetaccuracy, and the horizon length.

Quick Read (beta)

loading the full paper ...