next up previous
Next: References Up: No Title Previous: Multiple Alignment with Profile

   
Gibbs Sampling

Problem 6.7   Locating a common pattern.
INPUT: A set of sequences $\mathcal{S}$ $= S^{(1)},\ldots,S^{(n)}$ and an integer w.
QUESTION: For each string S(i), find a sub-string of length at most w, so that the similarity between the n sub-strings is maximized.

Let $a^{(1)},\ldots,a^{(n)}$ be the starting indices of the chosen sub-strings in $S^{(1)},\ldots,S^{(n)}$, respectively. We introduce the following notations: We therefore wish to maximize the logarithmic likelihood score:

\begin{displaymath}Score = \sum_{i=1}^{w}{{\sum_{j \in \Sigma}{c_{ij} \cdot \log{\frac{q_{ij}}{p_{j}}}}}}
\end{displaymath} (6.62)



To accomplish this task, we perform the following iterative procedure:
1.
Initialization: Randomly choose $a^{(1)},\ldots,a^{(n)}$.
2.
Randomly choose $1 \leq z \leq n$ and calculate the cij, qij and pj values for the strings in $\mathcal{S}$ $\setminus S^{(z)}$.
3.
Find the best substring of S(z) according to the model, and determine the new value of a(z). This is done by applying the algorithm for local alignment for S(z) against the profile of the current pattern.
4.
Repeat steps 2 and 3 until the improvement of the score is less then $\epsilon$.
Unlike the profile HMM technique, the Gibbs sampling algorithm (due to Lawrence et al. [8]) does not rely on any substantial theoretic basis. However, this method is known to work in specific cases.
Known problems:
next up previous
Next: References Up: No Title Previous: Multiple Alignment with Profile
Itshack Pe`er
1999-01-24