Next: Policy Iteration Algorithm
Up: No Title
Previous: Example: Running Value Iteration
Policy Iteration
In this section we present the Policy Iteration algorithm
(also referred to as PI) for finding the
optimal policy in a discounted infinite horizon
problem. As opposed to the Value Iteration algorithm, the
output of PI is not an approximation of the optimal policy,
but the optimal policy itself.
Yishay Mansour
1999-12-18