Next: References Up: No Title Previous: Multiple Alignment with Profile

Gibbs Sampling

Problem 6.7 Locating a common pattern.
INPUT: A set of sequences $\mathcal{S}$ $= S^{(1)},\ldots,S^{(n)}$ and an integer w.
QUESTION: For each string S⁽ⁱ⁾, find a sub-string of length at most w, so that the similarity between the n sub-strings is maximized.

Let $a^{(1)},\ldots,a^{(n)}$ be the starting indices of the chosen sub-strings in $S^{(1)},\ldots,S^{(n)}$ , respectively. We introduce the following notations:

Let c_ij be the number of occurrences of the symbol $j \in \Sigma$ among the $i^{\text{th}}$ positions of the n sub-strings: { $s^{(1)}_{a^{(1)}+i-1},\ldots,s^{(n)}_{a^{(n)}+i-1}$ }.
Let q_ij denote the probability of the symbol j to occur at the $i^{\text{th}}$ position of the pattern.
Let p_j denote the frequency of the symbol j in all sequences of $\mathcal{S}$ .

We therefore wish to maximize the logarithmic likelihood score:

$\begin{displaymath}Score = \sum_{i=1}^{w}{{\sum_{j \in \Sigma}{c_{ij} \cdot \log{\frac{q_{ij}}{p_{j}}}}}} \end{displaymath}$

(6.62)

To accomplish this task, we perform the following iterative procedure:

1.: Initialization: Randomly choose $a^{(1)},\ldots,a^{(n)}$ .
2.: Randomly choose $1 \leq z \leq n$ and calculate the c_ij, q_ij and p_j values for the strings in $\mathcal{S}$ $\setminus S^{(z)}$ .
3.: Find the best substring of S^(z) according to the model, and determine the new value of a^(z). This is done by applying the algorithm for local alignment for S^(z) against the profile of the current pattern.
4.: Repeat steps 2 and 3 until the improvement of the score is less then $\epsilon$ .

Unlike the profile HMM technique, the Gibbs sampling algorithm (due to Lawrence et al. [8]) does not rely on any substantial theoretic basis. However, this method is known to work in specific cases.
Known problems:

Phase shift - The algorithm may converge on an offset of the best pattern.
The value of w is usually unknown. Choosing different values for w may significantly change the results.
The strings may contain more than a single common pattern.
As it is the case with the Baum-Welch algorithm, the process may converge to a local maximum.

Next: References Up: No Title Previous: Multiple Alignment with Profile

Itshack Pe`er
1999-01-24