Problem Definition

Next: The Center Star Method Up: Approximation Algorithms for Multiple Previous: Approximation Algorithms for Multiple

Problem Definition

Given a family $\S = \left( { S_1, \ldots, S_k } \right)$ of k sequences, such that the sequences are ``similar'' to each other, we would like to find out the common characteristics of this family. Aligning each pair of sequences from $\S$ separately, often does not reveal this common information. A multiple alignment of $\S$ is a new set of sequences $\S' = \left( { S'_1, \ldots, S'_k } \right)$ such that:

All the strings in $\S'$ are of equal length. We denote this length by l.
Each S'_i was generated from S_i by inserting spaces.

When performing multiple alignment, as in the case of pairwise alignment, one wishes to evaluate the quality of the alignment by giving it a numeric score (see also lecture 3).

Definition 5.1 - The sum of pairs (SP) score of a multiple alignment ${{\cal M}}$ is the sum of the scores of pairwise global alignments induced by ${{\cal M}}$ .

Let $\sigma(x,y)$ be our scoring function, i.e., the price of aligning the character x with the character y, for $x, y \in \Sigma \cup \left\{ { -} \right\}$ . We assume that $\sigma( -, -) = 0$ , $\sigma( x, y ) = \sigma( y, x )$ , and that the triangle inequality $\sigma(x, y) \leq \sigma(x, z) + \sigma(z, y)$ holds.

Itshack Pe`er
1999-03-16