Next: Proof for Phased Q-Learning
Up: Reinforcement Learning - Final
Previous: The Main Theorem -
My Results
The article presents no proof to its
theorem, so the core of my project is providing a proof to the
theorem. The results I've accomplished depend on |A| (the size
of the actions-set), but under the assumption of a constant |A|,
the result are the same:
- For an appropriate choice of the parameters mD and lD,
the total number of calls to PS(M) required by the Phased-Q-Learning
algorithm in order to ensure that, with probability at least ,
the expected return of the resulting policy will be within
of the optimal policy, is:
|
(3) |
- For an appropriate choice of the parameters mI and lI,
the total number of calls to PS(M) required by the indirect algorithm in order
to ensure that, with probability at least ,
the expected return
of the resulting policy will be within
of the optimal policy, is
|
(4) |
Yishay Mansour
2000-05-30