Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

Abstract

Integrating learning-based techniques, especially reinforcement learning,into robotics is promising for solving complex problems in unstructuredenvironments. However, most existing approaches are trained in well-tunedsimulators and subsequently deployed on real robots without online fine-tuning.In this setting, the simulation's realism seriously impacts the deployment'ssuccess rate. Instead, learning with real-world interaction data offers apromising alternative: not only eliminates the need for a fine-tuned simulatorbut also applies to a broader range of tasks where accurate modeling isunfeasible. One major problem for on-robot reinforcement learning is ensuringsafety, as uncontrolled exploration can cause catastrophic damage to the robotor the environment. Indeed, safety specifications, often represented asconstraints, can be complex and non-linear, making safety challenging toguarantee in learning systems. In this paper, we show how we can impose complexsafety constraints on learning-based robotics systems in a principled manner,both from theoretical and practical points of view. Our approach is based onthe concept of the Constraint Manifold, representing the set of safe robotconfigurations. Exploiting differential geometry techniques, i.e., the tangentspace, we can construct a safe action space, allowing learning agents to samplearbitrary actions while ensuring safety. We demonstrate the method'seffectiveness in a real-world Robot Air Hockey task, showing that our methodcan handle high-dimensional tasks with complex constraints. Videos of the realrobot experiments are available on the project website(https://puzeliu.github.io/TRO-ATACOM).

Quick Read (beta)

loading the full paper ...