where,
Xt- State in time t
Yt- Action in time t
Under the assumption that M>|rt(s,a)| for each and that A and S are discrete then is well defined for all . ( is the group of stochastic, history-depended decision rules). Namely, for each decision rule there is a well defined value.