An Algorithm for Constructing an Optimal Policy

Next: Computational Complexity Up: Finite Horizon Previous: Summary

An Algorithm for Constructing an Optimal Policy

In this section we develop an optimal Markovian deterministic policy. As shown in the previous section, by this we achieve a general optimal policy. The algorithm construction is done from t=n to t=1 in a recursive manner.
The algorithm:

1.

For $t \leftarrow N$ , $\forall s_N, U_N(s_N) = r_N(s_N)$

2.

For $t \leftarrow t-1$

$\begin{displaymath}U_t(s_t) = \max_{ a \in A}\{r_t(s_t, a) + \sum_{j\in S} P_t(j\vert s_t,a)U_{t+1}(j)\}\end{displaymath}$

and the optimal group of actions is:

$\begin{displaymath}A_{s_t}^*(t) = argmax_{ a \in A} \{r_t(s_t,a) + \sum_{j\in S} P_t(j\vert s_t,a)U_{t+1}^*(j)\}\end{displaymath}$

The generated policy, ${\pi}^*$ , satisfying $\forall t, \forall s_t, {\pi}_t^*(s_t) \in A_{s_t}^*$ is an optimal policy.

Computational Complexity

Yishay Mansour
1999-11-18