Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping

Probabilistic Models for Mapping

Recall the following defintion:
$\begin{dfn}{\rm A {\em Poisson process} of rate $\lambda$ is described by: } \end{dfn}$

A non decreasing function $N: R_{0}^{+} \rightarrow N$ where N(t) = number of events until time t
N(0) = 0
The number of events in disjoint intervals are independent

$\begin{displaymath}P(N(t+s) - N(s) = n) = e^{-\lambda t} \frac {({\lambda t})^{n}} { n! } \mbox{, for } n = 0,1,\ldots \mbox { and } s \geq 0 \end{displaymath}$ (10.2)
Distribution of the number of events in an interval is stationary, i.e. depends only on the length of the interval. The expected number of events in an interval of length t is given by $E(N(t)) = \lambda t$ .

Denote:

T_n = time between event n-1 and event n
S₀ = 0
$S_{\imath} = \sum_{i=1}^{i} T_{\imath}$

Recall that inter-event times in a Poisson process are i.i.d. random variables, exponentially distributed with parameter $\lambda$ , i.e.,

$\begin{displaymath}Pr(T_{i} > t) = e^{- \lambda t} \end{displaymath}$

(10.3)

If it is known that $n \geq 1$ events occurred in a Poisson process until time t, then the inter-arrival times $\{S_{1},S_{2},...,S_{n} \}$ are distributed uniformly and independently in [0,t]. Assume clone length L, genome length G, and choose N clones at random. What is the expected fraction of the genome covered by clones? For a random point b, and an arbitrary clone C the probability of the point b being included in the clone c is given by:

$\begin{displaymath}Pr(b \in c) = \frac {L} {G} \end{displaymath}$

(10.4)

and therefore, the probability of b being out of all the clones is given by:

$\begin{displaymath}Pr(\forall c : b \notin c) = (1 - \frac {L} {G})^{N} = (1 - \frac {L} {G})^{G \frac {N} {G}} \sim e^{-\frac {NL} {G}} \end{displaymath}$

(10.5)

with the last approximation being valid when $L \ll G$ and $N \ll G$ .
$\begin{dfn}{\rm The fraction} \end{dfn}$

$\begin{displaymath}R=\frac{NL}{G}\end{displaymath}$

is said to be the redundancy of the clone set.
$\begin{dfn}{\rm The expected fraction of non-covered genome} \end{dfn}$
is given by

$\begin{displaymath}E(\mbox{fraction not covered}) = e^{-R} \end{displaymath}$

(10.6)

where R is the redundancy of the clone set Table 10.2 shows that using redundancy factor of 2 to 5 gives a good coverage of the genome segment considered.

Table 10.1: Coverage of genome segment depending on redundancy factor

R	Coverage
1	0.63
2	0.865
3	0.95
4	0.98
5	0.993

Define the length by setting clone length = 1, and denote N = number of clones, R = redundancy factor, and assume that the clone starting positions follow a Poisson process with rate $\lambda$ . We define a minimal overlap factor $\theta$ between clones to identify overlap. A set of clones covering a continuous segment of the genome, together with their physical distances is called a contig. Contigs are sometimes referred to as islands.

Theorem 10.5 Lander-Waterman 1988 [4]:

1.

The expected number of apparent islands is given by

$\begin{displaymath} Ne^{-R(1-\theta)} \end{displaymath}$

(10.7)

2.

The expected number of apparent islands with exactly $j \geq 1$ clones is given by

$\begin{displaymath}N e^{-2R(1-\theta)} {(1-e^{-R(1-\theta)})}^{(j-1)} \end{displaymath}$

(10.8)

3.

The expected number of clones in an apparent island is given by

$\begin{displaymath}e^{-R(1-\theta)} \end{displaymath}$

(10.9)

4.

The expected length of an apparent island is

$\begin{displaymath}\frac {e^{-R(1-\theta)} - 1} {R} + \theta \end{displaymath}$

(10.10)

Proof:We will prove the first item of the theorem. In order to prove the formula for expected number of apparent islands, we define J(x) as follows:

$\begin{eqnarray*}{} \centering J(x) &= & Pr(\mbox{two points } a, b = a + X ... ...r(\mbox{there are no left-end points in the interval } [b-1,a]) \end{eqnarray*}$

Since [b-1,a] is of length 1-x, J(x) can be computed from the redundancy factor R as follows:

$\begin{displaymath}J(x) = \begin{cases} e^{-R(1-x)} & 0 \leq x \leq 1, \\ 1 & \text{otherwise}. \end{cases} \end{displaymath}$

(10.11)

The number of islands is the number of times leaving a clone without detecting an overlap. Let E_c denote the event of a clone c being the right-hand clone of an island. If the right-hand side of the island is at a point t, we require that t and $t - \theta$ are not covered by a common clone (other than c) . The probability of such an event E_c is given by:

$\begin{displaymath}P(E) = J(\theta) \end{displaymath}$

(10.12)

and the expected number of apparent islands is therefore given by:

$\begin{displaymath}Exp(\mbox{number of apparent islands}) = N \cdot J(\theta) = Ne^{-R(1-\theta)} \end{displaymath}$

(10.13)

Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping

Itshack Pe`er
1999-03-21