Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Abstract

To improve the efficiency of reinforcement learning, we propose a novelasynchronous federated reinforcement learning framework termed AFedPG, whichconstructs a global model through collaboration among $N$ agents using policygradient (PG) updates. To handle the challenge of lagged policies inasynchronous settings, we design delay-adaptive lookahead and normalized updatetechniques that can effectively handle the heterogeneous arrival times ofpolicy gradients. We analyze the theoretical global convergence bound ofAFedPG, and characterize the advantage of the proposed algorithm in terms ofboth the sample complexity and time complexity. Specifically, our AFedPG methodachieves $\mathcal{O}(\frac{{\epsilon}^{-2.5}}{N})$ sample complexity at eachagent on average. Compared to the single agent setting with$\mathcal{O}(\epsilon^{-2.5})$ sample complexity, it enjoys a linear speedupwith respect to the number of agents. Moreover, compared to synchronous FedPG,AFedPG improves the time complexity from $\mathcal{O}(\frac{t_{\max}}{N})$ to$\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$, where $t_{i}$ denotesthe time consumption in each iteration at the agent $i$, and $t_{\max}$ is thelargest one. The latter complexity $\mathcal{O}(\frac{1}{\sum_{i=1}^{N}\frac{1}{t_{i}}})$ is always smaller than the former one, and this improvementbecomes significant in large-scale federated settings with heterogeneouscomputing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify theimproved performances of AFedPG in three MuJoCo environments with varyingnumbers of agents. We also demonstrate the improvements with differentcomputing heterogeneity.

Quick Read (beta)

loading the full paper ...