The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Abstract

Stochastic approximation is a class of algorithms that update a vectoriteratively, incrementally, and stochastically, including, e.g., stochasticgradient descent and temporal difference learning. One fundamental challenge inanalyzing a stochastic approximation algorithm is to establish its stability,i.e., to show that the stochastic vector iterates are bounded almost surely. Inthis paper, we extend the celebrated Borkar-Meyn theorem for stability from theMartingale difference noise setting to the Markovian noise setting, whichgreatly improves its applicability in reinforcement learning, especially inthose off-policy reinforcement learning algorithms with linear functionapproximation and eligibility traces. Central to our analysis is thediminishing asymptotic rate of change of a few functions, which is implied byboth a form of strong law of large numbers and a commonly used V4 Lyapunovdrift condition and trivially holds if the Markov chain is finite andirreducible.

Quick Read (beta)

loading the full paper ...