next up previous
Next: Examples of Genome Rearrangements Up: No Title Previous: Why Study Genome Rearrangements?

   
Unsigned Permutations

We will assume that we are able to identify genes on the chromosome, and we will discuss a single chromosome. We will also assume that all the genes are different. The order of the genes, which might be different in different taxa, is a permutation of these genes. Thus we will be discussing sequences of unsigned, different integers, where each permutation $\pi = (\pi_1 \ldots
\pi_n)$ represents a different order of genes. We write this sequence horizontally, using the terms left and right to denote directions along it.  
\begin{dfn}{\rm A $reversal$\space is taking a subsequence and
reversing it, for example 12345 $\rightarrow$\space 14325.} \end{dfn}
 
\begin{dfn}% latex2html id marker 1365
{\rm The $reversal$\space $distance$\spac...
...rm one sequence into
another (see figure \ref{lec11:Fig:Reversals}).} \end{dfn}

  
Figure 11.1: Example of reversals; the parts underlined show where the reversals took place
\framebox{
\begin{minipage}{\textwidth}
\begin{tabbing}
\ \ \ \ \= \ \ \ \ \= ...
... ,5,2,3) \\
$\pi _{4}$\space = (6,4,1,5,2,3)
\end{tabbing}
\end{minipage}
}

Problem 11.1   Sorting by reversals.
INPUT: A permutation $\pi $.
QUESTION: Find d($\pi $), the reversal distance between $\pi $ and id.

This problem has been investigated in the last few years with the following results:
1.
2-approximation algorithm [17]
2.
1.75-approximation algorithm [3]
3.
NP Completeness proof [6]
4.
1.5-approximation algorithm [8]
 
\begin{dfn}{\rm A $breakpoint$\space is any place in the sequence
where two adj...
...n the sequence 123654 there is
a breakpoint between the 3 and the 6.} \end{dfn}
We denote the number of breakpoints in $\pi $ by $b(\pi)$. When performing a reversal, transforming $\pi $ into $\pi '$, we denote $b(\pi') - b(\pi)$ by $\Delta b$.

Theorem 11.2   [17]
$\displaystyle \frac{b(\pi)}{2}$ $\textstyle \leq$ $\displaystyle d(\pi) \leq n$ (11.1)

Proof:On the one hand a reversal can fix up at most two breakpoints, and on the other hand it will take us at most n reversals to create any sequence.  
\begin{dfn}{\rm A $strip$\space is a maximal subsequence without
breakpoints. F...
...trip ''2 3'' is increasing, whereas the
strip ''7 6'' is decreasing.} \end{dfn}

Lemma 11.3   If $\pi \neq id$ contains a decreasing strip, there is a reversal that decreases b($\pi $) by k, k $\geq$ 1. Such a reversal is called a good reversal.

Proof:
1.
Find the decreasing strip with the minimal number, let K be this number. K will be at the right end of the strip.
2.
Find (K-1) in $\pi $; it will have to be in an increasing strip, and therefore will also be at its right end.
3.
Reverse the entire sequence between these two numbers, so that K and (K-1) will be adjacent. Having joined these two numbers, a breakpoint is reduced (see figure 11.2 ). \fbox
  
Figure 11.2: Two possible cases to reduce a breakpoint using a decreasing strip (K=4).
\framebox{
\begin{minipage}{\textwidth}
\begin{tabbing}
\ \ \ \ \= \ \ \ \ \= ...
...w$\space 2 3 4 5 6 7 $\rightarrow$\space \ldots
\end{tabbing}
\end{minipage}
}

Lemma 11.3 gives rise to the following algorithm: If there exists a decreasing strip, find and perform a good reversal ( $\Delta b=-1$). Else reverse an increasing strip, thus creating a decreasing strip ( $\Delta b=0$). This algorithm leads to performance of at most 4 times the optimum, since there are at most 2b($\pi $) reversals.

Lemma 11.4   [17]   If every reversal that removes a breakpoint results in a permutation without any decreasing strip, then there exists a reversal that removes 2 breakpoints.

Proof:Let $\pi = \pi_1 \ldots \pi_n$ be the input permutation. Assume that every reversal that removes a breakpoint results in a permutation without a decreasing strip. We use the following notation: $\pi_i$ - the smallest element in a decreasing strip $\pi_j$ - the greatest element in a decreasing strip $(\pi_{i}-1)$ has got to be to the left of $\pi_i$, otherwise we can reverse the strip that includes $(\pi_{i}-1)$, thus reducing a breakpoint and still maintaining a decreasing strip - the one that includes $\pi_i$ (see figure 11.3, top). Similarly, $(\pi_{j}+1)$ has got to be to the right of $\pi_j$ (see figure 11.3, bottom).
  
Figure 11.3: Two impossible scenarios
\framebox{ \begin{minipage}{\textwidth}
\begin{tabbing}
\ \ \ \ \= \ \ \ \ \= ...
...pace \ldots \ldots
$\leftarrow$\space $\pi_j$ 
\end{tabbing}
\end{minipage}
}

Consider the interval between $\pi_j$ and $(\pi_{j}+1)$ along $\pi $, calling it $\rho_j$ (including $\pi_j$ but not including $(\pi_{j}+1)$) ; and the interval between $(\pi_{i}-1)$ and $\pi_i$, calling it $\rho_i$ (including $\pi_i$ but not including $(\pi_{i}-1)$) (see figure 11.4).
  
Figure 11.4: A situation where the two strips do not overlap.

\fbox{\epsfig{figure=lec11_fig/lec11_Strip1.eps,width=10cm}}




$\rho_j$ and $\rho_i$ must overlap, otherwise we can reverse just one of them, leaving a decreasing strip in the other. Similarly, none of $\rho_j$, $\rho_i$ contains the other, nor can $\pi_j$ be to the left of $(\pi_{i}-1)$. The only remaining case is(see figure 11.5):
$\displaystyle (\pi_{j}+1)\notin\rho_i$ $\textstyle \quad$ $\displaystyle \pi_j\in\rho_i$ (11.2)
$\displaystyle (\pi_{i}-1)\notin\rho_j$ $\textstyle \quad$ $\displaystyle \pi_i\in\rho_j$ (11.3)


  
Figure 11.5: The remaining case where the two strips overlap.

\fbox{\epsfig{figure=lec11_fig/lec11_Strip2.eps,width=10cm}}




If $\rho_i \setminus \rho_j$ contains a decreasing strip, then reversing the entire $\rho_j$ interval leaves us a decreasing strip. Furthermore, if $\rho_i \setminus \rho_j$ contains an increasing strip, then reversing the entire $\rho_i$ interval leaves us a decreasing strip. Hence, $\rho_i \setminus \rho_j =
\emptyset$. Similarly, $\rho_j \setminus \rho_i = \emptyset$, implying that $\rho_j = \rho_i$. Reversing $\rho_j = \rho_i$ is therefore a reversal that removes two breakpoints. Lemma 11.4 gives rise to the following algorithm: For as long as possible, either:
1.
Perform a good reversal using a decreasing strip, resulting in a permutation with a decreasing strip ( $\Delta b=-1$). Or, if no such reversal exists:
2.
Perform a reversal with $\Delta b=-2$, and then reverse any strip. This algorithm leads to performance of at most 2 times the optimum, since $\Delta b=-1$ on the average.

next up previous
Next: Examples of Genome Rearrangements Up: No Title Previous: Why Study Genome Rearrangements?
Itshack Pe`er
1999-03-16