Next: Evaluating the average
Up: Stochastic Models
Previous: Using to control the
Choosing
Since :
(where A is a constant that bounds
|(Hri)(s)-ri(s)+Wi(s)| ).
We need to require that
:
. Otherwise, we have to assume that the distance
||r* - r0|| is bounded. The above condition insures that any starting point converges to the optimal value.
We add a condition to insure that the converge rate is fast enough :
:
One simple choice could be :
Yishay Mansour
1999-12-16