Next: Examples of Genome Rearrangements Up: No Title Previous: Why Study Genome Rearrangements?

Unsigned Permutations

We will assume that we are able to identify genes on the chromosome, and we will discuss a single chromosome. We will also assume that all the genes are different. The order of the genes, which might be different in different taxa, is a permutation of these genes. Thus we will be discussing sequences of unsigned, different integers, where each permutation $\pi = (\pi_1 \ldots \pi_n)$ represents a different order of genes. We write this sequence horizontally, using the terms left and right to denote directions along it.
$\begin{dfn}{\rm A $reversal$\space is taking a subsequence and reversing it, for example 12345 $\rightarrow$\space 14325.} \end{dfn}$

$\begin{dfn}% latex2html id marker 1365 {\rm The $reversal$\space $distance$\spac... ...rm one sequence into another (see figure \ref{lec11:Fig:Reversals}).} \end{dfn}$

**Figure 11.1:** Example of reversals; the parts underlined show where the reversals took place
$\framebox{ \begin{minipage}{\textwidth} \begin{tabbing} \ \ \ \ \= \ \ \ \ \= ... ... ,5,2,3) \\ $\pi _{4}$\space = (6,4,1,5,2,3) \end{tabbing} \end{minipage} }$

Problem 11.1 Sorting by reversals.
INPUT: A permutation $\pi$ .
QUESTION: Find d( $\pi$ ), the reversal distance between $\pi$ and id.

This problem has been investigated in the last few years with the following results:

1.: 2-approximation algorithm [17]
2.: 1.75-approximation algorithm [3]
3.: NP Completeness proof [6]
4.: 1.5-approximation algorithm [8]

$\begin{dfn}{\rm A $breakpoint$\space is any place in the sequence where two adj... ...n the sequence 123654 there is a breakpoint between the 3 and the 6.} \end{dfn}$
We denote the number of breakpoints in $\pi$ by $b(\pi)$ . When performing a reversal, transforming $\pi$ into $\pi '$ , we denote $b(\pi') - b(\pi)$ by $\Delta b$ .

Theorem 11.2 [17]

$\displaystyle \frac{b(\pi)}{2}$

$\textstyle \leq$

$\displaystyle d(\pi) \leq n$

(11.1)

Proof:On the one hand a reversal can fix up at most two breakpoints, and on the other hand it will take us at most n reversals to create any sequence.
$\begin{dfn}{\rm A $strip$\space is a maximal subsequence without breakpoints. F... ...trip ''2 3'' is increasing, whereas the strip ''7 6'' is decreasing.} \end{dfn}$

Lemma 11.3 If $\pi \neq id$ contains a decreasing strip, there is a reversal that decreases b( $\pi$ ) by k, k $\geq$ 1. Such a reversal is called a good reversal.

Proof:

1.

Find the decreasing strip with the minimal number, let K be this number. K will be at the right end of the strip.

2.

Find (K-1) in $\pi$ ; it will have to be in an increasing strip, and therefore will also be at its right end.

3.

Reverse the entire sequence between these two numbers, so that K and (K-1) will be adjacent. Having joined these two numbers, a breakpoint is reduced (see figure 11.2 ). $\fbox$

**Figure 11.2:** Two possible cases to reduce a breakpoint using a decreasing strip (K=4).
$\framebox{ \begin{minipage}{\textwidth} \begin{tabbing} \ \ \ \ \= \ \ \ \ \= ... ...w$\space 2 3 4 5 6 7 $\rightarrow$\space \ldots \end{tabbing} \end{minipage} }$

Lemma 11.3 gives rise to the following algorithm: If there exists a decreasing strip, find and perform a good reversal ( $\Delta b=-1$ ). Else reverse an increasing strip, thus creating a decreasing strip ( $\Delta b=0$ ). This algorithm leads to performance of at most 4 times the optimum, since there are at most 2b( $\pi$ ) reversals.

Lemma 11.4 [17] If every reversal that removes a breakpoint results in a permutation without any decreasing strip, then there exists a reversal that removes 2 breakpoints.

Proof:Let $\pi = \pi_1 \ldots \pi_n$ be the input permutation. Assume that every reversal that removes a breakpoint results in a permutation without a decreasing strip. We use the following notation: $\pi_i$ - the smallest element in a decreasing strip $\pi_j$ - the greatest element in a decreasing strip $(\pi_{i}-1)$ has got to be to the left of $\pi_i$ , otherwise we can reverse the strip that includes $(\pi_{i}-1)$ , thus reducing a breakpoint and still maintaining a decreasing strip - the one that includes $\pi_i$ (see figure 11.3, top). Similarly, $(\pi_{j}+1)$ has got to be to the right of $\pi_j$ (see figure 11.3, bottom).

**Figure 11.3:** Two impossible scenarios
$\framebox{ \begin{minipage}{\textwidth} \begin{tabbing} \ \ \ \ \= \ \ \ \ \= ... ...pace \ldots \ldots $\leftarrow$\space $\pi_j$ \end{tabbing} \end{minipage} }$

Consider the interval between $\pi_j$ and $(\pi_{j}+1)$ along $\pi$ , calling it $\rho_j$ (including $\pi_j$ but not including $(\pi_{j}+1)$ ) ; and the interval between $(\pi_{i}-1)$ and $\pi_i$ , calling it $\rho_i$ (including $\pi_i$ but not including $(\pi_{i}-1)$ ) (see figure 11.4).

**Figure 11.4:** A situation where the two strips do not overlap.
$\fbox{\epsfig{figure=lec11_fig/lec11_Strip1.eps,width=10cm}}$

$\rho_j$ and $\rho_i$ must overlap, otherwise we can reverse just one of them, leaving a decreasing strip in the other. Similarly, none of $\rho_j$ , $\rho_i$ contains the other, nor can $\pi_j$ be to the left of $(\pi_{i}-1)$ . The only remaining case is(see figure 11.5):

$\displaystyle (\pi_{j}+1)\notin\rho_i$	$\textstyle \quad$	$\displaystyle \pi_j\in\rho_i$	(11.2)
$\displaystyle (\pi_{i}-1)\notin\rho_j$	$\textstyle \quad$	$\displaystyle \pi_i\in\rho_j$	(11.3)

**Figure 11.5:** The remaining case where the two strips overlap.
$\fbox{\epsfig{figure=lec11_fig/lec11_Strip2.eps,width=10cm}}$

If $\rho_i \setminus \rho_j$ contains a decreasing strip, then reversing the entire $\rho_j$ interval leaves us a decreasing strip. Furthermore, if $\rho_i \setminus \rho_j$ contains an increasing strip, then reversing the entire $\rho_i$ interval leaves us a decreasing strip. Hence, $\rho_i \setminus \rho_j = \emptyset$ . Similarly, $\rho_j \setminus \rho_i = \emptyset$ , implying that $\rho_j = \rho_i$ . Reversing $\rho_j = \rho_i$ is therefore a reversal that removes two breakpoints. Lemma 11.4 gives rise to the following algorithm: For as long as possible, either:

1.: Perform a good reversal using a decreasing strip, resulting in a permutation with a decreasing strip ( $\Delta b=-1$ ). Or, if no such reversal exists:
2.: Perform a reversal with $\Delta b=-2$ , and then reverse any strip. This algorithm leads to performance of at most 2 times the optimum, since $\Delta b=-1$ on the average.

Next: Examples of Genome Rearrangements Up: No Title Previous: Why Study Genome Rearrangements?

Itshack Pe`er
1999-03-16