Next: Example: Running Policy Iteration
Up: Policy Iteration
Previous: Policy Iteration Algorithm
Convergence of Policy Iteration Algorithm
We shall
see that when the number of states - |S|, and the number of
actions - |A| are finite, there are no two
non-consecutive iterations with the same policy, unless we have an optimal policy.
Therefore dn converges to the optimal policy
in a finite number of
steps. The key to the convergence of dn is the monotonicity of .
Claim 6.4
Let
vn,
vn+1 be the values of consecutive iterations of the above
algorithm, then
.
Proof:Let dn+1 be the policy in the policy improvement step, then
Therefore:
.
Multiplying by
we get:
Note: The above claim is valid even when S or A are infinite.
Theorem 6.5
Let
S and
A be finite,
then the
Policy Iteration algorithm converges to the optimal
policy after at most |
A|
|S| iterations.
Proof:Clearly,
.
According to claim , in each step
except for the
last step in which
vn+1 = vn. Therefore no policy
can appear in two different iterations. Hence the number
of iterations
Next: Example: Running Policy Iteration
Up: Policy Iteration
Previous: Policy Iteration Algorithm
Yishay Mansour
1999-12-18