Immediate reward and the probability of transition

Next: Policy and Decision Rules Up: Introduction Previous: states and actions

Immediate reward and the probability of transition

As a result of performing an action ${a}\in{A_{s}}$ in state $s\in{S}$ at time t:

1. The agent is rewarded an immediate reward R_t(s,a). We define the expectation of R_t as r_t(s,a)=E[R_t(s,a)].

2.The system transfers to a new state s', determined according to transition probability P_t(s'|s,a). We assume that P_t is well defined, that is, that for every $s\in{S}$ and ${a}\in{A_{s}}$ , $\sum_j P(j\vert s,a) = 1$ .

We will not discuss how or when the immediate reward reach the agent. It may be accumulated in the time frame [t, t+1], or alternatively, it can be given in a single point in time between t and t+1. In any case all that matters to the agent is that the immediate reward reaches it before t+1.

A Markovian process is define as a process in which the only information needed from history is the current state. We define a Markovian process as (T, S, A, P_t(.|s,a), R_t(s,a)). The process defined here is a Markovian one since the following states and immediate rewards (and therefore the whole continuation of the process) depends only on the current state and operation chosen, and not on the history.

Next: Policy and Decision Rules Up: Introduction Previous: states and actions

Yishay Mansour
1999-11-15