next up previous
Next: Properties of the transition Up: Calculating the Return Value Previous: Existence of a unique

   
Example:

(Consider the MDP in figure 5.1)

For a policy $\pi$, which picks a11 in S1 and a21 in S2 we compute the following values:

\begin{eqnarray*}V({S_1})& = & 5 + \lambda[\frac{1}{2}V({S_1}) +
\frac{1}{2}V({S_2})]\\ V({S_2}) & = & -1 + \lambda[1\cdot
V({S_2})]
\end{eqnarray*}


Or in matrix notation:

$\vec{v} =
\left(\begin{array}{c}5\\ -1\end{array}\right) +
\lambda\left(\begin{array}{cc}
\frac{1}{2} & \frac{1}{2} \\
0 & 1
\end{array}\right)\vec{v}$

Solutions are,

\begin{eqnarray*}V({S_2}) = - \frac{1}{1-\lambda}\\
V({S_1}) =
\frac{5-\frac{\frac{1}{2}}{1-\lambda}}{1-\frac{\lambda}{2}}
\end{eqnarray*}





Yishay Mansour
1999-11-24