where,
Xt- State in time t
Yt- Action in time t
Under the assumption that
M>|rt(s,a)| for each
and that A and S are discrete then
is well defined for all
.
(
is the group of stochastic, history-depended decision rules). Namely, for each decision rule there is a well defined value.