Next: Existence of a unique
Up: No Title
Previous: Assumptions
Calculating the Return Value of a Given Policy
According to Theorem 4.3 from the previous lecture, for each
stochastic history dependent policy
there exists a Markovian stochastic policy
that has the same
return, i.e.,
.
Let
,
then
(where
is similar to policy
starting from the
second step)
If
is stationary then
and
All the parameters aside from
are known,
thus we have a set of linear equations of the form
.
We will show that
these equations have a single solution which is
.
Yishay Mansour
1999-11-24