Next: About this document ... Up: No Title Previous: No Title

Homework number 2.

Question 1

Show an example of an MDP with n states and two actions, where the Policy Iteration algorithm requires n iterations to converge to the optimal policy.
(Open problem: Show an example that requires more than n iterations.)

Question 2

Let $\pi_1$ and $\pi_2$ be two deterministic stationary policies for a given MDP M. Construct a new policy $\pi$ in the following way,

$\begin{displaymath}\pi(s)= \left\{\begin{array}{ll} \pi_1(s) & \mbox{ if } v^{\... ...{\pi_1}_\lambda(s) < v^{\pi_2}_\lambda(s) \end{array} \right. \end{displaymath}$

Show that $v^{\pi}_\lambda \geq \max\{v^{\pi_1}_\lambda, v^{\pi_2}_\lambda\}$ .

Question 3

We can use the approximate value function, computed in the Value Iteration algorithm, to define a policy.

Prove that if the number of states and actions are finite, then after a finite number of iterations this policy equals the optimal policy and never changes.

Give an example of an MDP where the number of iterations until the policy defined by value iteration converges to the optimal policy is at least $1/(1-\lambda)$ . (Hint: build an MDP such that from the start state you can branch two two different states, once you reach a state you remain in it.)

The homework is due in two weeks

Next: About this document ... Up: No Title Previous: No Title

Yishay Mansour
1999-11-25