Next: Approximation Algorithms for Multiple
Up: Multiple Alignment
Previous: Scoring Metrics
Carrillo and Lipman [1] found a heuristic method for accelerating the search
for the best multiple alignment.
The method is based on the property that if the strings are relatively similar, the alignment path would be
close to the main diagonal, therefore not all the values in the multi-dimensional cube need to be calculated, we now
detail this algorithm.
Assuming an upper bound on cost of the best alignment, we will discard some alignments that are
a priori known to be more expensive than the bound on the cost.
Let A be an alignment of strings
6#6.
Denote by Ai,j the pair of rows in A containing only xi and xj, and by
c(Ai,j) the cost of this pairwise
alignment.
Denote by c(A) the total cost of A, and
suppose we define
13#13.
Let A* be the optimal alignment (the one with the minimal cost),
and suppose we know that
14#14.
Therefore,
15#15
Where D(x,y) is the optimal score for aligning strings x and y.
It follows that
16#16
A*u,v is a projection of A* on the uv-plain. By calculating
D(xi,xj) for each i and j,
we can find
17#17.
Now, consider a cell
18#18
whose projection to the uv-plane is (s,t).
If the best alignment A* passes through this cell, then its projection A*u,v passes through (s,t),
and its cost
c(A*u,v) agrees with
19#19
where
best(u,v)s,t is an upper
bound on the optimal score for an alignment through (s,t) in the uv-plain. We can compute such an upper bound as:
20#20
where
21#21
is the cost of matching the characters 22#22
and 23#23.
Therefore if
best(u,v)s,t > B(u,v), then the best alignment A* cannot pass through the cell
18#18
for any
24#24,
and
these cells can be discarded from the computation.
Next: Approximation Algorithms for Multiple
Up: Multiple Alignment
Previous: Scoring Metrics
Peer Itsik
2000-12-06