On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Please use this identifier to cite or link to this item: http://dspace.mediu.edu.my:8181/xmlui/handle/1721.1/7205

Full metadata record

DC Field	Value	Language
dc.creator	Jaakkola, Tommi	-
dc.creator	Jordan, Michael I.	-
dc.creator	Singh, Satinder P.	-
dc.date	2004-10-20T20:49:46Z	-
dc.date	2004-10-20T20:49:46Z	-
dc.date	1993-08-01	-
dc.date.accessioned	2013-10-09T02:48:33Z	-
dc.date.available	2013-10-09T02:48:33Z	-
dc.date.issued	2013-10-09	-
dc.identifier	AIM-1441	-
dc.identifier	CBCL-084	-
dc.identifier	http://hdl.handle.net/1721.1/7205	-
dc.identifier.uri	http://koha.mediu.edu.my:8181/xmlui/handle/1721	-
dc.description	Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.	-
dc.format	15 p.	-
dc.format	77605 bytes	-
dc.format	356324 bytes	-
dc.format	application/octet-stream	-
dc.format	application/pdf	-
dc.language	en_US	-
dc.relation	AIM-1441	-
dc.relation	CBCL-084	-
dc.subject	reinforcement learning	-
dc.subject	stochastic approximation	-
dc.subject	sconvergence	-
dc.subject	dynamic programming	-
dc.title	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms	-
Appears in Collections:	MIT Items

Files in This Item:

There are no files associated with this item.

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets