Next: The Parallel Sampling Model
Up: Reinforcement Learning - Final
Previous: Introduction
Notations
- M is an MDP
- S is the set of states in M
- A is the set of actions in M
-
RaM(s) is the return of action a from state s.
RaM(s) is assumed to be constant in the article
(non-stochastic returns), but this causes no loss of generality,
since instead of assuming the reward is constant, we can take its
expectation.
-
Pasj is the probability of reaching state j by
performing action a from state s.
Yishay Mansour
2000-05-30