A path
in the model
is a sequence of states. We can now define the state
transition probabilities and the emission probabilities in
terms of
given a sequence
:
(6.6)
The probability that the sequence X was generated by the model
given the path
is therefore:
(6.7)
Where for our convenience we denote
and
.
Example 6.3
An HMM for detecting CpG islands in a long DNA sequence.
The model contains eight states corresponding to the four symbols
of the alphabet {A,C,G,T}:
State:
A+
C+
G+
T+
A-
C-
G-
T-
Emitted Symbol:
A
C
G
T
A
C
G
T
If the probability for staying in a CpG island is p and the
probability of staying outside it is q, then the transition
probabilities will be as described in table
6.2 (derived from the
transition probabilities given in table
6.1 under the assumption that we lose
memory when moving from/into a CpG island, and that we ignore
background probabilities).
Table 6.2:
Transition probabilities in the CpG islands HMM
A+
C+
G+
T+
A-
C-
G-
T-
A+
0.180p
0.274p
0.426p
0.120p
C+
0.171p
0.368p
0.274p
0.188p
G+
0.161p
0.339p
0.375p
0.125p
T+
0.079p
0.355p
0.384p
0.182p
A-
0.300q
0.205q
0.285q
0.210q
C-
0.322q
0.298q
0.078q
0.302q
G-
0.248q
0.246q
0.298q
0.208q
T-
0.177q
0.239q
0.292q
0.292q
In this special case the emission probability of each state
X+ or X- is exactly 1 for the symbol X and 0 for any
other symbol.
Let us consider another example, where the emission probabilities
will not be degenerate.
Example 6.4
Suppose a dealer in a casino tosses a coin. We know the dealer may
use a fair coin or a biased coin which has a probability of 0.75 to
get a "head". We also know that the dealer does not tend to change
coins - this happens only with a probability of 0.1. Given a sequence of
coin tosses we wish to determine when did the dealer use the biased coin
and when did he use a fair coin.
The corresponding HMM is:
The states are
,
where
F stands for "fair" and B for "biased".
The alphabet is
,
where h stands for "head" and t for
"tails".
Returning to the general case, we have defined the probability
for a given sequence X and a given path .
However, we do not know the actual sequence of states
that emitted
.
We
therefore say that the generating path of X is hidden.
Problem 6.5
The decoding problem.
INPUT: A hidden Markov model
and a sequence
.
QUESTION: Find an optimal generating path
for X,
such that
is maximized. We denote this also by:
In the CpG islands case (problem 6.2),
the optimal path can help us find the location of the islands. Had
we known ,
we could have traversed it determining that
all the parts that pass through the "+" states are CpG islands.
Similarly, in the coin-tossing case (example
6.4), the parts of
that pass
through the B (biased) state are suspected tosses of the biased
coin.
A solution for the optimal path problem is described in the next
section.
Next:Viterbi Algorithm Up:Hidden Markov Models Previous:Preface: CpG islandsItshack Pe`er 1999-01-24