Motivation

Next: Gap Penalty Up: End free-space alignment Previous: End free-space alignment

Motivation

One example where end-spaces should be free is in the shotgun sequence assembly procedure. In this problem, one has a large set of partially overlapping subsequences that come from many copies of one original but unknown DNA sequence. The problem is to use comparisons of pairs of subsequences to infer the correct original sequence. Two random subsequences from the set are unlikely to have nearby starting positions along the original sequence, and this is reflected by a low end-space free alignment score for those two subsequences. But if two subsequences do overlap in the original sequence, then an alignment may be between suffix of one to a prefix of the other with only a small number of spaces and mismatches. This overlap is detected, given a high score by an end-space free weighted alignment. Similarly, the case where one subsequence contains another can be detected in this way. See figure 2.6 for illustration.

**Figure 2.6:** sequence assembly
56#56

When comparing two sequences, it is not obvious how to place the two sequences, so that the similarity between the two will be maximal. One possibility, denoted by the ends free problem is to disregard leading and trailing indel operations (in the usual similarity strategy, all indel operations reduce the similarity). To implement this we will change the algorithm presented for the global alignment problem, as follows:

Set initial conditions:

57#57
Use the same recurrence for 58#58, 59#59

60#60
Instead for looking at V(n, m) the algorithm will search for i^* and j^* so that:

61#61
The similarity will be defined as:

62#62

Looking for i^* means searching for a cell in the last row of the table, produced while computing V(n,m). Looking for j^* means searching for a cell in the last column of the same table. This eliminates trailing indel operations. Leading indel operations will not be taken into account due to the changes in the initial conditions. Complexity:

Time complexity. Computing the matrix takes O(nm). Finding j^* and i^* takes O(n+m). Therefore the total complexity remains O(nm).
Space complexity. Computing the matrix takes O(n + m) space. Computing the maximizing values i^*, j^* requires the last row and column to be saved, which is also O(n + m). Therefore the total complexity remains O(n + m).

Next: Gap Penalty Up: End free-space alignment Previous: End free-space alignment

Peer Itsik
2000-11-20