Next: The Parallel Sampling Model Up: Reinforcement Learning - Final Previous: Introduction

Notations

M is an MDP
S is the set of states in M
A is the set of actions in M
R^a_M(s) is the return of action a from state s. R^a_M(s) is assumed to be constant in the article (non-stochastic returns), but this causes no loss of generality, since instead of assuming the reward is constant, we can take its expectation.
P^a_sj is the probability of reaching state j by performing action a from state s.

Yishay Mansour
2000-05-30