Next: Optimality Proof
Up: Example: The Recruiting Problem
Previous: The Goal
The Solution
We will first construct a corresponsing MDP for the problem, and then develop the
optimal policy for it.
Let
the possible actions, where Continue stands for 'continue' and
QuitAndHire for 'quit and recruit'.
Let
be the group of states of the MDP, standing for:
- MaxSoFar - the last interviewed candidate was the best so far
- Other - the last interviewed candidate was NOT the best so far
- 1 - the last interviewed candidate was THE best candidate in the entire group and was hired
- 0 - the last interviewed candidate was NOT the best candidate in the entire group and was hired
Figure:
The recruiting problem MDP
|
Figure shows the resultant MDP, using the following transition probabilities:
where N is the finite time horizon (the size of the candidates group) and t is the current time
(the number of candidates interviewed so far).
Writing the optimality equations for this problem we get:
Assigning the terms for qt and rt we get:
|
(4.1) |
|
(4.2) |
An optimal decision rule has the following properties:
- In s=Other, always choose
as=Continue (otherwise the return is promised to be zero)
- In
s=MaxSoFar, choose
as=Continue if
and
as=QuitAndHire otherwise
(in order to maximize
Ut*(MaxSoFar))
We now show that the optimal policy is of the form:
- Interview
candidates, performing
- Quit and hire the first candidate after time ,
which is the best so far
Next: Optimality Proof
Up: Example: The Recruiting Problem
Previous: The Goal
Yishay Mansour
1999-11-18