Aligning a String to a Profile

Next: Iterative pairwise alignment Up: Common Multiple Alignments Methods Previous: Common Multiple Alignments Methods

Aligning a String to a Profile

Given a database of sequences, we would like to partition it into families of ``similar'' sequences. For this purpose we would like to encompass our knowledge on the common properties of the sequences in a family profile (formal definition to follow). Constructing a profile of a family enables us to identify its members and test whether or not a new sequence belongs to the family. Moreover, searching the database with a profile is more sensitive than searching using a single sequence of the family: When searching with a single sequence, we can only look for the sequences in the database with the best alignments with the given sequence, while when using a profile we may test for membership to the family.

Definition 5.15 for an alignment S' of length l, a profile is a $l \times \vert\Sigma \cup \left\{ {-} \right\}\vert$ matrix, whose columns are probability vector denoting the frequencies of each symbol in the corresponding alignment column.

Any alignment between a sequence B and a profile P (i.e. both have the same length) can be evaluated by $\sum_{j=1}^{m} \sigma( p_j, b_j )$ . Clearly, using dynamic programming, we can find the best alignment of a sequence against a profile. The key in pairwise alignment is scoring two positions x and y : $\sigma(x,y)$ . For a letter x and a column y of a profile, let $\sigma(x,y)$ be the probability of x being in column y. The value for x depends on the frequency of it's occurences in the column y. We also need to devise a score for $\sigma(x,-)$ . In order to find whether a given sequence is a member of certain family, we use a usual pairwise dynamic programming alignment to compare the given sequence to the family profile.

Next: Iterative pairwise alignment Up: Common Multiple Alignments Methods Previous: Common Multiple Alignments Methods

Itshack Pe`er
1999-03-16