TD(0) - Optimal Control
Learn online the Q function.
Qt+1 (st ,at ) = Qt (st ,at )+ a [ rt+g Qt (st+1,at+1) - Qt (st ,at )]
at+1 the e-greedy policy for Qt.
CONVERGENCE: GUARANTEED [S,JJS]
sarsa - algorithm
Previous slide
Next slide
Back to first slide
View graphic version