Next: Properties of the transition
Up: Calculating the Return Value
Previous: Existence of a unique
Example:
(Consider the MDP in figure
5.1)
For a policy ,
which picks a11 in
S1 and a21 in S2 we compute the following values:
Or in matrix notation:
Solutions are,
Yishay Mansour
1999-11-24