Next: Bounding
Up: Proof for Phased Q-Learning
Previous: Proof for Phased Q-Learning
Notations
- Ql: A
(state,action) value function defined by
,
where
.
Note, this is the operation of the Value-Iteration algorithm
-
:
A
(state,action) value function defined by
,
where
,
and tk are the mD next states
observed from (s,a) on the mD calls to PS(M). Note, this is
the operation of the phased-Q-learning algorithm
- Q* denotes, as usually, the optimal value function.
Yishay Mansour
2000-05-30