Profile HMMs

Next: Aligning Sequences to a Up: Profile Alignment Previous: Profile Alignment

Profile HMMs

HMMs can be used for aligning a string versus a given profile, thus helping us to solve the multiple alignment problem.

We define a profile $\mathcal{P}$ of length L, as a set of probabilities, consisting of, for each $b \in \Sigma$ and $1 \leq i \leq L$ , the probability e_i(b) of observing the symbol b at the $i^{\mbox{th}}$ position. In such a case the probability of a string $X=(x_{1},\ldots,x_{L})$ given the profile $\mathcal{P}$ will be:

$\begin{displaymath}P(X \vert {\mathcal{P}}) = \prod_{i=1}^{L}{e_{i}(x_{i})} \end{displaymath}$

(42)

We can calculate a likelihood score for the ungapped alignment of X against the profile $\mathcal{P}$ :

$\begin{displaymath}Score(X \vert {\mathcal{P}}) = \sum_{i=1}^{L}{ \log\frac{ e_{i}(x_{i})}{p(x_{i})}} \end{displaymath}$

(43)

where p(b) is the background frequency of occurrences of the symbol b.

This leads to a definition of the following HMM: all the states are match states $M_{1},\ldots,M_{L}$ which correspond to matches of the string's symbols with the profile positions. All these states are sequentially linked (i.e., each match state M_j is linked to its successor M_j+1) as shown in figure 6.2. The emission probability of the symbol b from the state M_j is of course e_j(b).

**Figure 6.2:** Match states in a profile HMM
$\includegraphics[width=16cm]{lec06_fig/lec06_MatchStates.eps}$

To allow insertions, we will add also insertion states $I_{0},\ldots,I_{L}$ to the model. We shall assume that:

$\begin{displaymath}\forall_{b \in \Sigma} \quad e_{I_{j}}(b) = p(b) \end{displaymath}$

Each insertion state I_j has an link entering from the corresponding match state M_j, a leaving link towards the next match state M_j+1 and also has a self-loop (see figure 6.3). Assigning the appropriate probabilities for those transitions corresponds to the application of affine gap penalties, since the overall contribution of a gap of length h to the logarithmic likelihood score is:

$\begin{displaymath}\underset{\mbox{gap creation}} {\underbrace{\log(a_{M_{j}I_{j... ...{gap extension}} {\underbrace{(h-1)\cdot\log(a_{I_{j}I_{j}})}}\end{displaymath}$

**Figure 6.3:** A profile HMM with an insertion state (and some match states)
$\includegraphics{lec06_fig/lec06_InsertionState.eps}$

To allow deletions as well, we add the deletion states $D_{1},\ldots,D_{L}$ . These states cannot emit any symbol and are therefore called silent (Note that the begin/end states are silent as well). The deletion states are sequentially linked, in a similar manner to the match states and they are also interleaved with the match states (see figure 6.4).

**Figure 6.4:** Profile HMM with deletion and match states
$\includegraphics{lec06_fig/lec06_DeletionStates.eps}$

To model both insertions and deletions, we have to add a link from D_j to I_j and a link from I_j to D_j+1.

The full HMM for modeling the profile $\mathcal{P}$ of length L is comprised of L layers, each layer has three states M_j, I_jand D_j. To complete the model, we add begin and end states, connected to the layers as shown in figure 6.5. This model is due to Haussler et al [5].

**Figure 6.5:** Profile HMM for global alignment
$\includegraphics{lec06_fig/lec06_HMM_GlobalAlign.eps}$

Next: Aligning Sequences to a Up: Profile Alignment Previous: Profile Alignment

Peer Itsik
2000-12-19