Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  • 2024-04-11 10:50:07
  • Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess
  • 0

Abstract

We investigate whether Deep Reinforcement Learning (Deep RL) is able tosynthesize sophisticated and safe movement skills for a low-cost, miniaturehumanoid robot that can be composed into complex behavioral strategies indynamic environments. We used Deep RL to train a humanoid robot with 20actuated joints to play a simplified one-versus-one (1v1) soccer game. Theresulting agent exhibits robust and dynamic movement skills such as rapid fallrecovery, walking, turning, kicking and more; and it transitions between themin a smooth, stable, and efficient manner. The agent's locomotion and tacticalbehavior adapts to specific game contexts in a way that would be impractical tomanually design. The agent also developed a basic strategic understanding ofthe game, and learned, for instance, to anticipate ball movements and to blockopponent shots. Our agent was trained in simulation and transferred to realrobots zero-shot. We found that a combination of sufficiently high-frequencycontrol, targeted dynamics randomization, and perturbations during training insimulation enabled good-quality transfer. Although the robots are inherentlyfragile, basic regularization of the behavior during training led the robots tolearn safe and effective movements while still performing in a dynamic andagile way -- well beyond what is intuitively expected from the robot. Indeed,in experiments, they walked 181% faster, turned 302% faster, took 63% less timeto get up, and kicked a ball 34% faster than a scripted baseline, whileefficiently combining the skills to achieve the longer term objectives.

 

Quick Read (beta)

loading the full paper ...