Next: My Results
Up: Reinforcement Learning - Final
Previous: Indirect Algorithm
The Main Theorem - Bound on the Number of Samples
The main theorem of the article bounds the number of calls to
the subroutine PS(M), required by the learning algorithms to
ensure with probability of at least
that the achieved
policy is an
-optimal policy. The bounds stated by
the article are:
Theorem 5.1
Main Theorem
- For an appropriate choice of the parameters mD and lD,
the total number of calls to PS(M) required by the Phased-Q-Learning
algorithm in order to ensure that, with probability at least ,
the expected return of the resulting policy will be within
of the optimal policy, is:
|
(1) |
- For an appropriate choice of the parameters mI and lI,
the total number of calls to PS(M) required by the indirect algorithm in order
to ensure that, with probability at least ,
the expected return
of the resulting policy will be within
of the optimal policy, is
|
(2) |
Next: My Results
Up: Reinforcement Learning - Final
Previous: Indirect Algorithm
Yishay Mansour
2000-05-30