next up previous
Next: About this document ... Up: Markovian Policy Previous: Markovian Policy

   
The Return Function


\begin{eqnarray*}V_N^{\pi}(s) & = & \sum_{t=1}^{N-1}\sum_{j \in S}\sum_{a \in A_...
...n S}\sum_{a \in A_j}[r_N(j)]\cdot Prob[X_N=j, Y_N=a \vert X_1=s]
\end{eqnarray*}


Therefore:
1.
$\forall N\; V_N^{\pi}(s) = V_N^{\pi^{'}}(s)$
Since we proved in the last theorem that the dsitribution function is equal for $\pi$ and $\pi^{'}$.
2.
$g^{\pi}(s) = g^{\pi^{'}}(s)$
Since (1) is true for all N.
3.
$V_{\lambda}^{\pi}(s) = V_{\lambda}^{\pi^{'}}(s)$
One should note that it is imposible to prove theorm [*] for history dependent and Markovian deteministic policies, since the random property of the policy allows the modeling of all the histories under one state.



Yishay Mansour
1999-11-18