Next: Correctness of Value Iteration
Up: Finding the Optimal Policy:
Previous: Finding the Optimal Policy:
The Value Iteration Algorithm
Input: MDP and parameters
and
- 1.
- Choose an initial return value function
(by choosing a number for each ).
- 2.
-
.
- 3.
- Assign the next return value function:
- 4.
-
- 5.
- If
,
stop,
Else return to ().
- 6.
- Choose the output policy such that:
Yishay Mansour
1999-12-18