next up previous
Next: Policy Iteration Algorithm Up: No Title Previous: Example: Running Value Iteration

   
Policy Iteration

In this section we present the Policy Iteration algorithm (also referred to as PI) for finding the optimal policy in a discounted infinite horizon problem. As opposed to the Value Iteration algorithm, the output of PI is not an approximation of the optimal policy, but the optimal policy itself.



 

Yishay Mansour
1999-12-18