Next: Convergence proof
Up: Q-learning and SARSA algorithms
Previous: remarks:
SARSA
SARSA is on-line algorithm. In this algorithm keep two states the current state and the next
state.
Hence its name "S(s),A(a),R(r),S(s'),A(a')" is derive from the fact that we use
the current state (s), current action (a), current reward (r), next state (s') and next
action (a'). We update Q with the difference between the next value function in s'
and the current value function.
Figure 9.2:
Algorithm for SARSA
. |
Yishay Mansour
2000-01-07