The Statistical Model

Next: Clone Pair Overlap Score Up: Constructing Physical Maps from Previous: Introduction

The Statistical Model

We will now present the statistical model we use for the above mapping method. We assume the following:

1.

Clones are uniformly and independently distributed along the target genome.

2.

Clones are of equal length.

3.

Probe occurrences along the genome are modeled by a Poisson process.

4.

The Poisson rate is identical for all probes.

5.

The noise statistically behaves as follows:

False positive errors - a Poisson process with parameter .
False negative errors independently occur with probability for each hybridization

The hybridization scenario is shown by figure 10.8. The clones are the horizontal lines. The random occurrences of a single nonunique probe are marked by the dotted vertical lines. We denote by A the probe - clone occurrence matrix: A_i,j = k if probe j occurs k times in clone i. The probe in this example occurs 3 times along this 7 clones genome section, so its column in the occurrence matrix would be (1, 1, 0, 0, 1, 2, 0). The probability of j occurring k times in i is given by:

$\begin{displaymath}Pr(A_{i,j} = k) = \frac{(\lambda l)^{k} e^{- \lambda l}}{k!} \end{displaymath}$

(10.14)

We denote by B the probe clone hybridization matrix: B_i,j = 1 or B_i,j = 0 depending on whether probe j hybridized with clone i or not. The vector $\overrightarrow{B_{i}}$ of the hybridizations of clone i with all the probes is also called its hybridization fingerprint. In case no noise is present hybridization occurs iff there is at least one occurrence of the probe. In this case the appropriate column of B would be (1, 1, 0, 0, 1, 1, 0). Experimental noise can result in both false positive hybridizations (B_i,j = 1 when A_i,j = 0), and false negative hybridizations (B_i,j = 0 when A_i,j > 0).

Hybridization fingerprints of intersecting clones are correlated. This fact is used in order to estimate the clone pairs overlap. Although noise reduces the correlation between fingerprints of overlapping clones, Bayesian inference can still be used to identify overlap, provided a sufficient number of probes is used. It may also be the case that "soft decision" hybridization signals are available. Such signals provide more information on probe occurrences than binary signals do. This continuous signal value does not directly correspond to the hybridization probability, and we have chosen to assume a threshold is used to transform the hybridization signal into a binary one. We therefore define the hybridization matrix B to be a binary matrix, such that B_i,j = 1 if probe j has produced a positive hybridization signal with clone i. The matrix B is the actual experimental data, which is the input for the construction algorithm. The matrix contains noise and no information on multiplicities. Using the statistical model we can write the following equation:

$\begin{eqnarray*}Pr(B_{i,j} =1 \vert A_{i,j}) &= & Pr(\mbox{false positive}) + \... ...ha^{A_{i,j}})\\ &= & 1-e^{- \beta l \alpha} \alpha^{A_{i,j}} \end{eqnarray*}$

Next: Clone Pair Overlap Score Up: Constructing Physical Maps from Previous: Introduction

Itshack Pe`er
1999-03-21