Forward and Backward Probabilities

Next: Parameter Estimation for HMMs Up: Hidden Markov Models Previous: Viterbi Algorithm

Forward and Backward Probabilities

Problem 6.6 The likelihood problem.
INPUT: A hidden Markov model $\mathcal{M}$ $= (\Sigma,Q,\Theta)$ and a sequence $X \in \Sigma^{\ast}$ , for which the generating path $\Pi=(\pi_{1},\ldots,\pi_{L})$ is unknown.
QUESTION: For each $1 \leq i \leq L$ and $k \in Q$ , compute the probability $P(\pi_{i}=k\vert X)$ .

For this we shall need some extra definitions.
Forward algorithm: Given a sequence $X=(x_{1},\ldots,x_{L})$ let us denote by f_k(i) the probability of emitting the prefix $(x_{1},\ldots,x_{i})$ and eventually reaching $\pi_{i}=k$ .
We use the same initial values for f_k(0) as was done in the Viterbi algorithm:

f_begin(0)	=	1	(6.20)
$\displaystyle \forall_{k \neq begin} \quad f_{k}(0)$	=	0	(6.21)

In analogy to 6.14 we can use the recursive formula:

$\begin{displaymath}f_{l}(i+1) = e_{l}(x_{i+1}) \cdot \sum_{k \in Q} {f_{k}(i) \cdot a_{kl}} \end{displaymath}$

(6.22)

We terminate the process by calculating:

$\begin{displaymath}P(X) = \sum_{k \in Q}{f_{k}(L) \cdot a_{k,end}} \end{displaymath}$

(6.23)

Backward algorithm: In a complementary manner we denote by b_k(i) the probability of the suffix $(x_{i+1},\ldots,x_{L})$ given $\pi_{i}=k$ .
In this case, we initialize:

$\begin{displaymath}\forall_{k \in Q} \quad b_{k}(L) = a_{k,end} \end{displaymath}$

(6.24)

The recursive formula is:

$\begin{displaymath}b_{k}(i) = \sum_{l \in Q}{a_{kl} \cdot e_{l}(x_{i+1}) \cdot b_{l}(i+1)} \end{displaymath}$

(6.25)

We terminate the process by calculating:

$\begin{displaymath}P(X) = \sum_{l \in Q}{a_{begin,l} \cdot e_{l}(x_{1}) \cdot b_{l}(1)} \end{displaymath}$

(6.26)

Complexity: All the values of f_k(i) and b_k(i) can be calculated in $O(L \cdot \vert Q\vert^2)$ time and stored in $O(L \cdot \vert Q\vert)$ space, as it is the case with Viterbi algorithm.
There is however one important difference: here we cannot trivially use the logarithmic weights, since (unlike in Viterbi) we do not perform only multiplication of probabilities, but we also sum probabilities. This may lead to numeric stabilization problems, unless proper measures, such as scaling the probabilities, are taken.
Using the forward and backward probabilities we can compute the value of $P(\pi_{i}=k\vert X)$ . Since the process only has memory of length 1, there is a dependency only on the last state, so we can write:

$\begin{displaymath} \begin{split} P(X,\pi_{i}=k) &= P(x_{1},\ldots,x_{i},\p... ...t \pi_{i}=k) = \\ &= f_{k}(i) \cdot b_{k}(i) \end{split} \end{displaymath}$

(6.27)

Using the definition of conditional probability, we obtain the solution to the likelihood problem:

$\begin{displaymath} P(\pi_{i}=k \vert X) = \frac{P(X,\pi_{i}=k)}{P(X)} = \frac{f_{k}(i) \cdot b_{k}(i)}{P(X)} \end{displaymath}$

(6.28)

Next: Parameter Estimation for HMMs Up: Hidden Markov Models Previous: Viterbi Algorithm

Itshack Pe`er
1999-01-24