Next: Indirect Algorithm
Up: The Learning Algorithms
Previous: The Learning Algorithms
Direct Algorithm - Phased-Q-Learning
The
Phased-Q-Learning algorithm is similar to the Q-Learning algorithm
we've encountered in class, only it works in phases. In each phase
the algorithm makes mD calls to PS(M) (where mD is
determined later by the analysis). The algorithm uses the mD
samples of every state-action pair collected by the mD calls to
PS(M) to update the value function as follows:
Note that the Phased-Q-Learning algorithm requires
calls to PS(M), where lD is the number of performed
phases.
Yishay Mansour
2000-05-30