Next: Approximate Policy Iteration
Up: Large State Space
Previous: Large State Space
Example showing that tied bound is right
Figure:
Example Diagram
|
- The optimal policy is :
V*(1) = V*(2) = 0
- Let
,
than
- Greedy policy
The return of
is,
Yishay Mansour
2000-01-11