The Solution

Next: Optimality Proof Up: Example: The Recruiting Problem Previous: The Goal

The Solution

We will first construct a corresponsing MDP for the problem, and then develop the optimal policy for it.
Let $A=\{Continue, QuitAndHire\}$ the possible actions, where Continue stands for 'continue' and QuitAndHire for 'quit and recruit'. Let $S=\{MaxSoFar, Other, 1, 0\}$ be the group of states of the MDP, standing for:

MaxSoFar - the last interviewed candidate was the best so far
Other - the last interviewed candidate was NOT the best so far
1 - the last interviewed candidate was THE best candidate in the entire group and was hired
0 - the last interviewed candidate was NOT the best candidate in the entire group and was hired

**Figure:** The recruiting problem MDP

Figure

shows the resultant MDP, using the following transition probabilities:

$q_t=Prob [\max\{x_1, ..., x_t\}=\max\{x_1, ..., x_N\}]=\frac{t}{N}$
$r_t=Prob [x_{t+1} \geq \max\{x_1, ..., x_t\}]=\frac{1}{t+1}$

where N is the finite time horizon (the size of the candidates group) and t is the current time (the number of candidates interviewed so far).
Writing the optimality equations for this problem we get:

$\begin{displaymath}U_t^*(1)=1, \;\; U_t^*(0)=0, \;\; U_N^*(MaxSoFar)=U_N^*(Other)=0\end{displaymath}$

$\begin{eqnarray*}U_t^*(Other) & = & \max\{0, r_tU_{t+1}^*(MaxSoFar)+(1-r_t)U_{t+... ...-r_t)U_{t+1}^*(Other)\} \\ & = & \max\{q_t, U_t^*(Other)\} \\ \end{eqnarray*}$

Assigning the terms for q_t and r_t we get:

$\begin{displaymath}U_t^*(Other)=\frac{1}{t+1}U_{t+1}^*(MaxSoFar)+\frac{t}{t+1}U_{t+1}^*(Other) \end{displaymath}$

(4.1)

$\begin{displaymath}U_t^*(MaxSoFar)=\max\{\frac{t}{N}, U_t^*(Other)\} \end{displaymath}$

(4.2)

An optimal decision rule has the following properties:

In s=Other, always choose a_s=Continue (otherwise the return is promised to be zero)
In s=MaxSoFar, choose a_s=Continue if $\frac{t}{N}<U_t^*(Other)$ and a_s=QuitAndHire otherwise (in order to maximize U_t^*(MaxSoFar))

We now show that the optimal policy is of the form:

Interview $\tau$ candidates, performing $a_t=Continue, \; t\leq\tau$
Quit and hire the first candidate after time $\tau$ , which is the best so far

Next: Optimality Proof Up: Example: The Recruiting Problem Previous: The Goal

Yishay Mansour
1999-11-18