Model predictive control-based value estimation for efficient reinforcement learning

Abstract

Reinforcement learning suffers from limitations in real practices primarilydue to the number of required interactions with virtual environments. Itresults in a challenging problem because we are implausible to obtain a localoptimal strategy with only a few attempts for many learning methods. Hereby, wedesign an improved reinforcement learning method based on model predictivecontrol that models the environment through a data-driven approach. Based onthe learned environment model, it performs multi-step prediction to estimatethe value function and optimize the policy. The method demonstrates higherlearning efficiency, faster convergent speed of strategies tending to the localoptimal value, and less sample capacity space required by experience replaybuffers. Experimental results, both in classic databases and in a dynamicobstacle avoidance scenario for an unmanned aerial vehicle, validate theproposed approaches.

Quick Read (beta)

loading the full paper ...