next up previous
Next: Policy Iteration Up: Finding the Optimal Policy: Previous: Convergence of Value Iteration

   
Example: Running Value Iteration Algorithm

(Consider the MDP in figure [*] )
Let:

\begin{eqnarray*}\lambda& = &\frac{1}{2}\\
{v_{n+1}}({s_1})& = &MAX\{ 5 + \lamb...
...}({s_2}) \} \\
{v_{n+1}}({s_2})& = &-1 + \lambda{v_n}({s_2})\\
\end{eqnarray*}


Step 1: We Initialize:

\begin{eqnarray*}{v_0}({s_1})& = & {v_0}({s_2}) = -10 \\
\end{eqnarray*}


Steps 2-5:

\begin{eqnarray*}{v_1}({s_2})& = &-1 + 0.5(-10) = -6\\
{v_1}({s_1})& = &MAX\{ ...
...\{ 5 + 0.25\cdot7 +
0.25\cdot(-3),\ 10 + 0.5\cdot(-3)\} = 8.5\\
\end{eqnarray*}


Step 6:

\begin{eqnarray*}\pi_{\epsilon}(s_{1}) & = a_{12}\\
\pi_{\epsilon}(s_{2}) & = a_{21}
\end{eqnarray*}


Note that the iterated value approaches $v^*_\lambda$, which is:

\begin{eqnarray*}v^*_\lambda({s_2})& = &-2\\
v^*_\lambda({s_1})& = &9\\
\end{eqnarray*}




Yishay Mansour
1999-12-18