Theorem 4.1
Let Ut* be a solution to the optimality equations then
1.
For any t,
Ut*(ht) depends on ht only through st
2.
There exist a Markovian deterministic optimal policy
Proof:We will use a reversed induction to prove (1).
Basis:
UN*(hN)=rN(sN), therefore
UN*(hN)=UN*(sN)
Induction Step: We assume the validity of the induction hypothesis for any n,
and will prove the validity for t=n.
Note that the marked term depends merely on s and a. The entire term, therefore depends solely on s.
Thus,
Ut*(ht)=Ut*(st)
To prove (2), let
be a Markovian deterministic policy that sutisfies:
Since the policy's definition depends solely on st, namely the current state,
is a Markovian policy.