Abstract
Federated reinforcement learning (FRL) has emerged as a promising paradigmfor reducing the sample complexity of reinforcement learning tasks byexploiting information from different agents. However, when each agentinteracts with a potentially different environment, little to nothing is knowntheoretically about the non-asymptotic performance of FRL algorithms. The lackof such results can be attributed to various technical challenges and theirintricate interplay: Markovian sampling, linear function approximation,multiple local updates to save communication, heterogeneity in the rewardfunctions and transition kernels of the agents' MDPs, and continuousstate-action spaces. Moreover, in the on-policy setting, the behavior policiesvary with time, further complicating the analysis. In response, we introduceFedSARSA, a novel federated on-policy reinforcement learning scheme, equippedwith linear function approximation, to address these challenges and provide acomprehensive finite-time error analysis. Notably, we establish that FedSARSAconverges to a policy that is near-optimal for all agents, with the extent ofnear-optimality proportional to the level of heterogeneity. Furthermore, weprove that FedSARSA leverages agent collaboration to enable linear speedups asthe number of agents increases, which holds for both fixed and adaptivestep-size configurations.