next up previous
Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping

   
Probabilistic Models for Mapping

Recall the following defintion:  
\begin{dfn}{\rm A {\em Poisson process} of rate $\lambda$
is described by: } \end{dfn}
Denote: Recall that inter-event times in a Poisson process are i.i.d. random variables, exponentially distributed with parameter $\lambda$, i.e.,

\begin{displaymath}Pr(T_{i} > t) = e^{- \lambda t}
\end{displaymath} (10.3)

If it is known that $n \geq 1$ events occurred in a Poisson process until time t, then the inter-arrival times $\{S_{1},S_{2},...,S_{n} \}$ are distributed uniformly and independently in [0,t]. Assume clone length L, genome length G, and choose N clones at random. What is the expected fraction of the genome covered by clones? For a random point b, and an arbitrary clone C the probability of the point b being included in the clone c is given by:

\begin{displaymath}Pr(b \in c) = \frac {L} {G}
\end{displaymath} (10.4)

and therefore, the probability of b being out of all the clones is given by:

\begin{displaymath}Pr(\forall c : b \notin c) = (1 - \frac {L} {G})^{N} = (1 - \frac
{L} {G})^{G \frac {N} {G}} \sim e^{-\frac {NL} {G}}
\end{displaymath} (10.5)

with the last approximation being valid when $L \ll G$ and $N \ll
G$.  
\begin{dfn}{\rm The fraction} \end{dfn}

\begin{displaymath}R=\frac{NL}{G}\end{displaymath}

is said to be the redundancy of the clone set.  
\begin{dfn}{\rm The expected fraction of non-covered
genome} \end{dfn}
is given by

\begin{displaymath}E(\mbox{fraction not covered}) = e^{-R}
\end{displaymath} (10.6)

where R is the redundancy of the clone set Table 10.2 shows that using redundancy factor of 2 to 5 gives a good coverage of the genome segment considered.






 
Table 10.1: Coverage of genome segment depending on redundancy factor
 
R Coverage
1 0.63
2 0.865
3 0.95
4 0.98
5 0.993



Define the length by setting clone length = 1, and denote N = number of clones, R = redundancy factor, and assume that the clone starting positions follow a Poisson process with rate $\lambda$. We define a minimal overlap factor $\theta$ between clones to identify overlap. A set of clones covering a continuous segment of the genome, together with their physical distances is called a contig. Contigs are sometimes referred to as islands.

Theorem 10.5   Lander-Waterman 1988 [4]:

1.
The expected number of apparent islands is given by

 \begin{displaymath}
Ne^{-R(1-\theta)}
\end{displaymath} (10.7)

2.
The expected number of apparent islands with exactly $j \geq 1$ clones is given by

\begin{displaymath}N e^{-2R(1-\theta)} {(1-e^{-R(1-\theta)})}^{(j-1)}
\end{displaymath} (10.8)

3.
The expected number of clones in an apparent island is given by

\begin{displaymath}e^{-R(1-\theta)}
\end{displaymath} (10.9)

4.
The expected length of an apparent island is

\begin{displaymath}\frac {e^{-R(1-\theta)} - 1} {R} + \theta
\end{displaymath} (10.10)

Proof:We will prove the first item of the theorem. In order to prove the formula for expected number of apparent islands, we define J(x) as follows:

\begin{eqnarray*}{}
\centering
J(x) &= & Pr(\mbox{two points } a, b = a + X ...
...r(\mbox{there are no left-end points in the interval } [b-1,a])
\end{eqnarray*}


Since [b-1,a] is of length 1-x, J(x) can be computed from the redundancy factor R as follows:

\begin{displaymath}J(x) =
\begin{cases}
e^{-R(1-x)} & 0 \leq x \leq 1, \\
1 & \text{otherwise}.
\end{cases}
\end{displaymath} (10.11)

The number of islands is the number of times leaving a clone without detecting an overlap. Let Ec denote the event of a clone c being the right-hand clone of an island. If the right-hand side of the island is at a point t, we require that t and $t - \theta$ are not covered by a common clone (other than c) . The probability of such an event Ec is given by:

\begin{displaymath}P(E) = J(\theta)
\end{displaymath} (10.12)

and the expected number of apparent islands is therefore given by:

\begin{displaymath}Exp(\mbox{number of apparent islands}) = N \cdot J(\theta) =
Ne^{-R(1-\theta)}
\end{displaymath} (10.13)


next up previous
Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping
Itshack Pe`er
1999-03-21