We will now show for every initial state, s, and a history dependent policy,
,
a Markovian policy
such that the distribution on
(Xt, Yt) is equal for
and
.
Theorem 4.3
Let
.
Then
there exists a Markovian stochastic policy
,
that sutisfies
Proof:For every
and
we define
as follows:
We will first show that this definition results in the same distibution over the group of actions:
The first equality is derived from the fact that
is Markovian.
The second equality is by definition.
It is left to be shown that the distribution over the group of states is equal under
and
,
i.e.
We will prove this part by an induction on t. The idea behind the proof of this part is that if at a certain step we have an equal distribution over the group of states, and we are taking the same stochastic action, we will endup with same distribution over the group of states.
Basis: for
in
and in
Induction Step: We assume that there is an identity between the distribution over the group of states in
and in
until the time t-1