Recall the following defintion:
A Poisson process of rate
is described by:
A non decreasing function
where
N(t) = number of events until time t
N(0) = 0
The number of events in disjoint intervals are independent
(2)
As as consequence, distribution of the number of events in an interval is stationary,
i.e. depends only on the length of the interval. The expected number of events
in an interval of length t is given by
.
Denote:
Tn = time between event n-1 and event n
S0 = 0
Recall that inter-event times in a Poisson process are i.i.d.
random variables, exponentially distributed with parameter
,
i.e.,
(3)
If it is known that
events occurred in a Poisson
process until time t, then the inter-arrival times
are distributed uniformly and
independently in [0,t].
Assume clone length L, genome length G, and choose N clones
at random. What is the expected fraction of the genome covered by
clones?
For a random point b, and an arbitrary clone C the probability
of the point b being included in the clone c is given by:
(4)
and therefore, the probability of b being out of all the clones
is given by:
(5)
with the last approximation being valid when
and .
The fraction
is said
to be the redundancy of the clone set.
The expected fraction of non-covered
genome is given by
(6)
where R is the redundancy of the clone set
Table 9.2 shows that using
redundancy factor of 2 to 5 gives a good coverage of the genome
segment considered.
Table 9.1:
Coverage of genome segment depending on redundancy factor
R
Coverage
1
0.63
2
0.865
3
0.95
4
0.98
5
0.993
Assume clone length = 1, and denote N = number of clones,
R = redundancy factor, and assume that the clone starting
positions follow a Poisson process with rate .
We define a minimal overlap factor
between clones to
identify overlap, that is, two clones defined to overlap only
if they share at least a -length section.
A set of clones covering a continuous segment of the genome,
together with their physical distances is called a contig.
Contigs are sometimes referred to as islands.