In this section we look at an
approximation algorithm for a multiple alignment that optimizes a
different score metrics - the consensus error. As before, we
assume the existence of a pairwise scoring scheme
satisfying the triangle inequality.
Definition 5.4
Given a set of strings ,
the consensus error of a string
with respect to
is
.
Note that
need not be in .
Problem 5.4
Optimal Steiner string.
INPUT: A set of strings
QUESTION: Find a string
S* which minimizes the consensus error E(S*) over all
possible strings.
The Steiner string S* attempts to capture the common
characteristics of the set of strings
and reflect them in a
single string. We will present an approximation algorithm for the
optimal Steiner string problem with worst-case approximation ratio
of 2.
Lemma 5.5
Let
have k strings, and assume
that the scoring scheme
satisfies the triangle
inequality. Then there exists a string
such that
(see e.g.
[3] [pp 349-351]).
Proof:For any
:
(5.5)
If we pick
such that
is closest to
S* then:
(5.6)
The center string
minimizes
and therefore its consensus error is smaller then
the consensus error of the
(the string closest to S*). We get:
(5.7)
The proof above uses the lemma 5.5 and the
fact that
It is worthwhile noting that Steiner string was defined without
alignment, and the only requirement is the distance function, that
satisfies the triangle inequality. We will next start discussing
consensus strings that are alignment motivated.
Next:Consensus Strings from Multiple Up:Approximation Algorithms for Multiple Previous:The Center Star MethodItshack Pe`er 1999-03-16