Next: Iterative pairwise alignment
Up: Common Multiple Alignments Methods
Previous: Common Multiple Alignments Methods
Given a database of sequences, we would like to partition it into
families of ``similar'' sequences. For this purpose we would like
to encompass our knowledge on the common properties of the sequences
in a family profile (formal definition to follow).
Constructing a profile of a family enables us to identify its members
and test whether or not a new sequence belongs to the family. Moreover,
searching the database with a profile is more sensitive than searching using a
single sequence of the family: When searching with a single
sequence, we can only look for the sequences in the database with the best
alignments with the given sequence, while when using a profile we
may test for membership to the family.
Definition 0.12
For an alignment S' of length l, a profile is a
74#74
matrix, whose columns are
probability vectors denoting the frequencies of each symbol in the
corresponding alignment column.
Any alignment between a sequence B and a profile P (i.e. both have the
same length) can be evaluated by
75#75.
Clearly, using dynamic programming, we can find the best alignment
of a sequence against a profile.
The key in pairwise alignment is scoring two positions x and y
:
25#25.
For a letter x and a column y of a
profile, let
25#25
be the probability of x being in column y. The
value for x depends on the frequency of it's occurences in
the column y. We also need to devise a score for
76#76.
In order to find whether a given sequence is a member of certain
family, we use a usual pairwise dynamic programming alignment
to compare the given sequence to the family profile.
Next: Iterative pairwise alignment
Up: Common Multiple Alignments Methods
Previous: Common Multiple Alignments Methods
Peer Itsik
2000-12-06