Next: A Comparison between VI
Up: Policy Iteration
Previous: Convergence of Policy Iteration
Example: Running Policy Iteration Algorithm
(Consider the MDP in figure
)
Let:
Step 1:
Policy Evaluation:
Policy Improvement:
Therefore:
Policy Evaluation:
The next policy improvement step shows that d2 = d1, and
therefore the algorithm terminates and outputs d1 as the
optimal policy.
Yishay Mansour
1999-12-18