Next: Gap Penalty
Up: End free-space alignment
Previous: End free-space alignment
Motivation
One example where end-spaces should be free
is in the shotgun sequence assembly procedure. In this problem, one has a
large set of partially overlapping subsequences that come from many
copies of one original but unknown DNA sequence. The problem is to use
comparisons of pairs of subsequences to infer the correct original
sequence. Two random subsequences from the set are unlikely
to have nearby starting positions
along the original sequence, and this is reflected by a low
end-space free alignment score for those two subsequences. But if two
subsequences do overlap in the original sequence, then an alignment may be between
suffix of one to a prefix of the other with only a small number of spaces and mismatches. This overlap is detected, given a high score by an end-space free weighted alignment.
Similarly, the case where one subsequence contains another can be
detected in this way. See figure 2.6 for illustration.
Figure 2.6:
sequence assembly
56#56 |
When comparing two sequences, it is not obvious how to place the two sequences,
so that the similarity between the two will be maximal. One possibility, denoted by the ends free problem is to disregard leading
and trailing indel operations (in the usual similarity strategy, all indel operations
reduce the similarity).
To implement this we will change the algorithm presented for the
global alignment problem, as follows:
- Set initial conditions:
57#57
- Use the same recurrence for
58#58,
59#59
60#60
- Instead for looking at V(n, m) the algorithm will search for
i* and j* so that:
61#61
- The similarity will be defined as:
62#62
Looking for i* means searching for a cell in the last row of
the table, produced while computing V(n,m). Looking for j* means searching
for a cell in the last column of the same table. This eliminates trailing indel
operations. Leading indel operations will not be taken into account due to the changes in the initial conditions.
Complexity:
- Time complexity. Computing the matrix takes O(nm). Finding j* and
i* takes O(n+m). Therefore the total complexity remains O(nm).
- Space complexity. Computing the matrix takes
O(n + m) space. Computing
the maximizing values i*, j* requires the last row and
column to be saved, which is also O(n + m). Therefore the total
complexity remains
O(n + m).
Next: Gap Penalty
Up: End free-space alignment
Previous: End free-space alignment
Peer Itsik
2000-11-20