Online Versus Off-Line Updates

Next: Differences between TD(0) and Up: No Title Previous: Temporal Difference and TD(0)

Online Versus Off-Line Updates

The basic advantage of TD(0) is in possibility to perform online updates based on information received up to current point. In TD(0) all we need to know to perform update is the next state.
Here is an example of TD(0)-like algorithm run :
An employee estimates time he need to get home from work.

1.: State: Leaving for home ; Elapsed time : 0 ; Estimated time to goal: 30 ; Total estimation: 30
2.: State: Got into the car, It's raining ; Elapsed time : 5 ; Estimated time to goal: 35 ; Total estimation: 40
3.: State: Getting out of highway ; Elapsed time : 20 ; Estimated time to goal: 15 ; Total estimation: 35
4.: State: There is big truck ahead; Elapsed time : 30 ; Estimated time to goal: 10 ; Total estimation: 40
5.: State: Got to the right street ; Elapsed time : 40 ; Estimated time to goal: 3 ; Total estimation: 43
6.: State: Got home ; Elapsed time : 43 ; Estimated time to goal: 0 ; Total estimation: 43

For MC-like algorithm we need to finish the run for making updates. For TD(0) updates are made online (hopefully in right direction) and in MC all updates are made only after a run is finished.

Next: Differences between TD(0) and Up: No Title Previous: Temporal Difference and TD(0)

Yishay Mansour
2000-01-06