DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints

Abstract

The domain of safe multi-agent reinforcement learning (MARL), despite itspotential applications in areas ranging from drone delivery and vehicleautomation to the development of zero-energy communities, remains relativelyunexplored. The primary challenge involves training agents to learn optimalpolicies that maximize rewards while adhering to stringent safety constraints,all without the oversight of a central controller. These constraints arecritical in a wide array of applications. Moreover, ensuring the privacy ofsensitive information in decentralized settings introduces an additional layerof complexity, necessitating innovative solutions that uphold privacy whileachieving the system's safety and efficiency goals. In this paper, we addressthe problem of multi-agent policy optimization in a decentralized setting,where agents communicate with their neighbors to maximize the sum of theircumulative rewards while also satisfying each agent's safety constraints. Weconsider both peak and average constraints. In this scenario, there is nocentral controller coordinating the agents and both the rewards and constraintsare only known to each agent locally/privately. We formulate the problem as adecentralized constrained multi-agent Markov Decision Problem and propose amomentum-based decentralized policy gradient method, DePAint, to solve it. Tothe best of our knowledge, this is the first privacy-preserving fullydecentralized multi-agent reinforcement learning algorithm that considers bothpeak and average constraints. We then provide theoretical analysis andempirical evaluation of our algorithm in a number of scenarios and compare itsperformance to centralized algorithms that consider similar constraints.

Quick Read (beta)

loading the full paper ...