Next: Linear Programming
Up: No Title
Previous: Example: Running Policy Iteration
A Comparison between VI and PI
Algorithms
In this section we will compare the convergence rate
of the VI and PI algorithms. It will be shown that, assuming that
the two algorithms begin with the same approximated value, the PI
algorithm converges faster to the optimized value.
Theorem 6.6
Let
be the series of
values created by the VI algorithm (where
un+1=
Lun) and
let
be the series of values created by PI algorithm.
If
u0=
v0, then
.
Proof:We will use induction to prove the theorem.
Induction Basis: We assume that u0=v0. v0 is the
return value of a specific policy, and therefore it is clearly
the optimal return value. Therefore:
.
Induction Step:
From the induction hypothesis
,
and since
Lpn is monotonic it follows that:
Since L is taking the maximum over all policies:
We denote the policy determined by PI algorithm in iteration nas dn and therefore:
Lvn=Ldnvn
From the Optimality Equations we get:
From the definition of vn+1 we have:
And we get
.
In Theorem
it was proven that
and therefore
.
From Theorem it follows
that, assuming the same starting point, PI algorithm requires less
stages then VI algorithm to converge to the optimal policy. Yet,
it should be noticed that each single stage of PI requires a
solution of a set of linear equations (the policy evaluation
stage) and therefore it is computationally more expensive than a
single stage of VI algorithm.
Next: Linear Programming
Up: No Title
Previous: Example: Running Policy Iteration
Yishay Mansour
1999-12-18