1. The agent is rewarded an immediate reward Rt(s,a). We define the expectation of Rt as rt(s,a)=E[Rt(s,a)].
2.The system transfers to a new state s', determined according to transition probability Pt(s'|s,a). We assume that Pt is well defined, that is, that for every and , .
We will not discuss how or when the immediate reward reach the agent. It may be accumulated in the time frame [t, t+1], or alternatively, it can be given in a single point in time between t and t+1. In any case all that matters to the agent is that the immediate reward reaches it before t+1.
A Markovian process is define as a process in which the only information needed from history is the current state. We define a Markovian process as (T, S, A, Pt(.|s,a), Rt(s,a)). The process defined here is a Markovian one since the following states and immediate rewards (and therefore the whole continuation of the process) depends only on the current state and operation chosen, and not on the history.