Next: The Expected Discounted Sum
Up: Infinite Horizon Problems
Previous: The Return Function
Example 1
This example is an expantion of exmaple 2 given in lecture 3.
We will first examine the value gathered from different return functions using two specific policies:
- 1.
- - always chooses a11 when in state s1
- 2.
-
- always chooses a12 when in state s1
Figure:
Infinite Horizon Example
|
Let us start by calculating
:
For
the gap between the two policies goes to 1 in favor of
The three suggested return functions evaluate to:
- 1.
- The expected sum of the immediate rewards:
- 2.
- Expected average reward:
- 3.
- Expected discounted sum:
Yishay Mansour
1999-11-18