next up previous
Next: My Results Up: Reinforcement Learning - Final Previous: Indirect Algorithm

   
The Main Theorem - Bound on the Number of Samples

The main theorem of the article bounds the number of calls to the subroutine PS(M), required by the learning algorithms to ensure with probability of at least $1-\delta$ that the achieved policy is an $\varepsilon$-optimal policy. The bounds stated by the article are:

Theorem 5.1   Main Theorem


next up previous
Next: My Results Up: Reinforcement Learning - Final Previous: Indirect Algorithm
Yishay Mansour
2000-05-30