Next: The algorithm using Monte
Up: Large State Space
Previous: Example showing that tied
Approximate Policy Iteration
The general structure is the same as in the Policy Iteration, except the following differences:
- We will not use ,
instead we use
(or
), which is only an approximation of .
The reasons of using approximations are the architecture that may not be strong enough and the noise caused by the simulations.
- Let
be the greedy policy of
.
We might take ,
which is close to
.
Those two differences are a source for an error.
Figure:
Regular Policy Iteration
|
Figure:
Approximate Policy Iteration
|
Yishay Mansour
2000-01-11