Summary

Next: An Algorithm for Constructing Up: Markovian Policy Previous: Markovian Policy

Theorem 3.2 and theorem 4.1 lead to

$\begin{displaymath}V_N^*(s) = \max_{ \pi \in {\Pi}^{HR}} \{V_N^{\pi}(s) = \max_{\pi \in {\Pi}^{MD}} \{V_N^{\pi}(s)\}\end{displaymath}$

Namely, the optimal policy can always be chosen out of the group of Markovian deterministic policies.

Yishay Mansour
1999-11-18