Next: Differences between TD(0) and
Up: No Title
Previous: Temporal Difference and TD(0)
Online Versus Off-Line Updates
The basic advantage of TD(0) is in possibility to perform online
updates based on information received up to current point. In
TD(0) all we need to know to perform update is the next state.
Here is an example of TD(0)-like algorithm run :
An employee
estimates time he need to get home from work.
- 1.
- State: Leaving for home ; Elapsed time : 0 ; Estimated time to
goal: 30 ; Total estimation: 30
- 2.
- State: Got into the car, It's raining ; Elapsed time : 5 ;
Estimated time to goal: 35 ; Total estimation: 40
- 3.
- State: Getting out of highway ; Elapsed time : 20 ;
Estimated time to goal: 15 ; Total estimation: 35
- 4.
- State: There is big truck ahead; Elapsed time : 30 ;
Estimated time to goal: 10 ; Total estimation: 40
- 5.
- State: Got to the right street ; Elapsed time : 40 ;
Estimated time to goal: 3 ; Total estimation: 43
- 6.
- State: Got home ; Elapsed time : 43 ;
Estimated time to goal: 0 ; Total estimation: 43
For MC-like algorithm we need to finish the run for making
updates. For TD(0) updates are made online (hopefully in right
direction) and in MC all updates are made only after a run is
finished.
Next: Differences between TD(0) and
Up: No Title
Previous: Temporal Difference and TD(0)
Yishay Mansour
2000-01-06