next up previous
Next: Consensus Strings from Multiple Up: Approximation Algorithms for Multiple Previous: The Center Star Method

     
Multiple Alignment with Consensus

In this section we look at an approximation algorithm for a multiple alignment that optimizes a different score metrics - the consensus error. As before, we assume the existence of a pairwise scoring scheme $\sigma$ satisfying the triangle inequality.

Definition 5.4   Given a set of strings $\S$, the consensus error of a string $\bar{S}$ with respect to $\S$ is $E(\bar{S}) = \sum_{S_i \in \S}
D(\bar{S}, S_i)$. Note that $\bar{S}$ need not be in $\S$.

Problem 5.4   Optimal Steiner string.
INPUT: A set of strings $\S$
QUESTION: Find a string S* which minimizes the consensus error E(S*) over all possible strings.

The Steiner string S* attempts to capture the common characteristics of the set of strings $\S$ and reflect them in a single string. We will present an approximation algorithm for the optimal Steiner string problem with worst-case approximation ratio of 2.

Lemma 5.5   Let $\S$ have k strings, and assume that the scoring scheme $\sigma$ satisfies the triangle inequality. Then there exists a string $\bar{S} \in \S$ such that $\frac{E(\bar{S})}{E(S^{*})} \leq 2 - \frac{2}{k} < 2$ (see e.g. [3] [pp 349-351]).

Proof:For any $\bar{S} \in \S$:
$\displaystyle E(\bar{S}) = \sum_{S_i \in \S} D(\bar{S}, S_i) \leq \sum_{S_i \neq
\bar{S}}[D(\bar{S}, S^*) + D(S^*, S_i)] =$     (5.5)
$\displaystyle (k-2) \cdot
D(\bar{S}, S^*) + D(\bar{S}, S^*) + \sum_{S_i \neq \bar{S}}D(S^*,
S_i) = (k-2) \cdot D(\bar{S}, S^*) + E(S^*)$      

If we pick $\bar{S} \in \S$ such that $\bar{S}$ is closest to S* then:

\begin{displaymath}E(S^*) = \sum_{S_i \in \S} D(S^*, S_i) \geq k \cdot D(\bar{S},
S^*)
\end{displaymath} (5.6)

The center string $S_c \in \S$ minimizes $\sum_{S_i \in \S}
D(S_c, S_i)$ and therefore its consensus error is smaller then the consensus error of the $\bar{S}$ (the string closest to S*). We get:
$\displaystyle \frac{E(S_c)}{E(S^*)} \leq \frac{(k-2) \cdot D(S_c, S^*) +
E(S^*)...
... \leq \frac{(k-2) \cdot D(S_c, S^*)}{k \cdot D(S_c,
S^*)} + 1 = 2 - \frac{2}{k}$     (5.7)

The proof above uses the lemma 5.5 and the fact that $E(S_c) \leq E(\bar{S})$ It is worthwhile noting that Steiner string was defined without alignment, and the only requirement is the distance function, that satisfies the triangle inequality. We will next start discussing consensus strings that are alignment motivated.
next up previous
Next: Consensus Strings from Multiple Up: Approximation Algorithms for Multiple Previous: The Center Star Method
Itshack Pe`er
1999-03-16