We will assume that we are able to identify genes on the
chromosome, and we will discuss a single chromosome. We will also
assume that all the genes are different. The order of the genes,
which might be different in different taxa, is a permutation of
these genes. Thus we will be discussing sequences of unsigned,
different integers, where each permutation
represents a different order of genes. We write this
sequence horizontally, using the terms left and right to
denote directions along it.
Figure 11.1:
Example of reversals; the parts underlined show where the
reversals took place
Problem 11.1
Sorting by reversals.
INPUT: A permutation .
QUESTION: Find d(), the reversal distance between
and
id.
This problem has been investigated in the last few years with the
following results:
Proof:On the one hand a reversal can fix up at most two breakpoints, and
on the other hand it will take us at most n reversals to create
any sequence.
Lemma 11.3
If
contains a decreasing
strip, there is a reversal that decreases b()
by k, k
1. Such a reversal is called a good reversal.
Proof:
1.
Find the decreasing strip with the minimal number, let K be this number.
K will be at the right end of the strip.
2.
Find (K-1) in ;
it will have to be in an increasing strip, and therefore will also be
at its right end.
3.
Reverse the entire sequence between these
two numbers, so that K and (K-1) will be adjacent. Having
joined these two numbers, a breakpoint is reduced (see figure
11.2 ).
Figure 11.2:
Two possible cases to reduce a breakpoint using a
decreasing strip (K=4).
Lemma 11.3 gives rise to the following
algorithm:
If there exists a decreasing strip, find and perform a good
reversal (
). Else reverse an increasing strip, thus
creating a decreasing strip (
).
This algorithm leads to performance of at most 4 times the
optimum, since there are at most 2b()
reversals.
Lemma 11.4
[17]
If every reversal that removes a
breakpoint results in a permutation without any decreasing strip,
then there exists a reversal that removes 2 breakpoints.
Proof:Let
be the input permutation. Assume
that every reversal that removes a breakpoint results in a
permutation without a decreasing strip. We use the following
notation:
- the smallest element in a decreasing strip
- the greatest element in a decreasing strip
has got to be to the left of ,
otherwise we
can reverse the strip that includes
,
thus reducing a
breakpoint and still maintaining a decreasing strip - the one that
includes
(see figure 11.3, top).
Similarly,
has got to be to the right of
(see figure 11.3, bottom).
Figure 11.3:
Two impossible scenarios
Consider the interval between
and
along
,
calling it
(including
but not including
)
; and the interval between
and
,
calling it
(including
but not including
)
(see figure 11.4).
Figure 11.4:
A situation where
the two strips do not overlap.
and
must overlap, otherwise we can reverse just
one of them, leaving a decreasing strip in the other. Similarly,
none of ,
contains the other, nor can
be to the left of
.
The only remaining case is(see figure 11.5):
(11.2)
(11.3)
Figure 11.5:
The remaining
case where the two strips overlap.
If
contains a decreasing strip, then
reversing the entire
interval leaves us a decreasing
strip. Furthermore, if
contains an
increasing strip, then reversing the entire
interval
leaves us a decreasing strip. Hence,
.
Similarly,
,
implying that
.
Reversing
is therefore a
reversal that removes two breakpoints.
Lemma 11.4 gives rise to the following
algorithm:
For as long as possible, either:
1.
Perform a good reversal using a decreasing strip, resulting in a permutation with a decreasing strip
(
).
Or, if no such reversal exists:
2.
Perform a reversal with
,
and then reverse any
strip.
This algorithm leads to performance of at most 2 times the
optimum, since
on the average.