Next: About this document ...
Up: Linear Programming
Previous: A Linear Programming Example
Use of
Linear Programming to Find the Optimal Policy
The general
translation of the problem
to a linear
program is as follows:
Minimize: x
Subject To:
It was shown that the Optimality Equations of the discounted
infinite horizon problem are:
Using the above translation we would get the following linear
program:
Minimize:
Subject To:
is any set of constants with the following
characteristics:
These constants can be viewed as the probability
distribution of the agent's initial state in the MDP.
Accordingly, the dual linear program would be:
Maximize:
Subject To:
Intuitively,
can be thought of as the
probability of being in state s and performing action a, while taking into account
the discount factor - .
In the following theorem
is defined more formally.
Theorem is assuring that every solution to the dual linear
program can be translated to a policy with a return value equal to
the maximized element in the program. Thus, the best solution of
the program can be translated to a policy with the maximal
possible return value. This policy will, obviously, be the best
policy.
Next: About this document ...
Up: Linear Programming
Previous: A Linear Programming Example
Yishay Mansour
1999-12-18