Please use this identifier to cite or link to this item: http://dspace.mediu.edu.my:8181/xmlui/handle/1721.1/7205
Full metadata record
DC FieldValueLanguage
dc.creatorJaakkola, Tommi-
dc.creatorJordan, Michael I.-
dc.creatorSingh, Satinder P.-
dc.date2004-10-20T20:49:46Z-
dc.date2004-10-20T20:49:46Z-
dc.date1993-08-01-
dc.date.accessioned2013-10-09T02:48:33Z-
dc.date.available2013-10-09T02:48:33Z-
dc.date.issued2013-10-09-
dc.identifierAIM-1441-
dc.identifierCBCL-084-
dc.identifierhttp://hdl.handle.net/1721.1/7205-
dc.identifier.urihttp://koha.mediu.edu.my:8181/xmlui/handle/1721-
dc.descriptionRecent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.-
dc.format15 p.-
dc.format77605 bytes-
dc.format356324 bytes-
dc.formatapplication/octet-stream-
dc.formatapplication/pdf-
dc.languageen_US-
dc.relationAIM-1441-
dc.relationCBCL-084-
dc.subjectreinforcement learning-
dc.subjectstochastic approximation-
dc.subjectsconvergence-
dc.subjectdynamic programming-
dc.titleOn the Convergence of Stochastic Iterative Dynamic Programming Algorithms-
Appears in Collections:MIT Items

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.