Next: Correctness of Value Iteration
Up: Finding the Optimal Policy:
Previous: Finding the Optimal Policy:
The Value Iteration Algorithm
Input: MDP and parameters
- 1.
- Choose an initial return value function
(by choosing a number for each
- 2.
- 3.
- Assign the next return value function:
- 4.

- 5.
- If
Else return to (
- 6.
- Choose the output policy such that:
Yishay Mansour