Aligning a String to a Profile

Next: Iterative pairwise alignment Up: Common Multiple Alignments Methods Previous: Common Multiple Alignments Methods

Aligning a String to a Profile

Given a database of sequences, we would like to partition it into families of ``similar'' sequences. For this purpose we would like to encompass our knowledge on the common properties of the sequences in a family profile (formal definition to follow). Constructing a profile of a family enables us to identify its members and test whether or not a new sequence belongs to the family. Moreover, searching the database with a profile is more sensitive than searching using a single sequence of the family: When searching with a single sequence, we can only look for the sequences in the database with the best alignments with the given sequence, while when using a profile we may test for membership to the family.

Definition 0.12 For an alignment S' of length l, a profile is a 74#74 matrix, whose columns are probability vectors denoting the frequencies of each symbol in the corresponding alignment column.

Any alignment between a sequence B and a profile P (i.e. both have the same length) can be evaluated by 75#75. Clearly, using dynamic programming, we can find the best alignment of a sequence against a profile. The key in pairwise alignment is scoring two positions x and y : 25#25. For a letter x and a column y of a profile, let 25#25 be the probability of x being in column y. The value for x depends on the frequency of it's occurences in the column y. We also need to devise a score for 76#76. In order to find whether a given sequence is a member of certain family, we use a usual pairwise dynamic programming alignment to compare the given sequence to the family profile.

Next: Iterative pairwise alignment Up: Common Multiple Alignments Methods Previous: Common Multiple Alignments Methods

Peer Itsik
2000-12-06