Please use this identifier to cite or link to this item:
http://dspace.mediu.edu.my:8181/xmlui/handle/1721.1/7205| Title: | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
| Keywords: | reinforcement learning stochastic approximation sconvergence dynamic programming |
| Issue Date: | 9-Oct-2013 |
| Description: | Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. |
| URI: | http://koha.mediu.edu.my:8181/xmlui/handle/1721 |
| Other Identifiers: | AIM-1441 CBCL-084 http://hdl.handle.net/1721.1/7205 |
| Appears in Collections: | MIT Items |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
