Introduction

Next: The Statistical Model Up: Constructing Physical Maps from Previous: Constructing Physical Maps from

Introduction

Physical mapping using hybridization fingerprints of short oligonucleotides was first suggested by Poustka et al. in 1986 [6]. In this technique short labeled DNA sequences, or probes, attach, or hybridize, to positions along the target DNA matching their own DNA sequence. The probes are nonunique, i.e., they occur at many points along the genome, and typically hybridize with $10\% - 50\%$ of the clones. Overlapping clones can be identified by their similar fingerprints. See figure 10.9 for an illustration of this hybridization scenario. [6] suggested this method in order to eliminate the need to process individual clones in the restriction digestion technique. They reported preliminary computer simulations demonstrating feasibility, and suggested the use of Bayesian inference in data analysis. More detailed strategies were offered by Michiels et al. [5]. A likelihood ratio based on a detailed statistical model was used to make overlap decisions, and a discussion of experimental errors was also included. Craig et al. [2] used short oligonucleotides in the ordering of cosmid clones covering the Herpes Simplex Virus (HSVI) genome. The clones were ordered manually. As each probe occurred only once or twice along the short ( $\sim$ 140KB) genome, this experiment does not represent the general problem.

**Figure 10.8:** Clones and non-unique probes. The clones are the horizontal lines. The random occurrences of a single nonunique probe are marked by the dotted vertical lines.
$\fbox{\epsfig{figure=lec10_fig/gfig10-5-2.eps,width=13cm}}$

**Figure 10.9:** Physical map example. The short horizontal lines are the clones with their x coordinates corresponding to the position on the target genome. The y coordinates correspond to the clone order in the constructed map. Note that each point on the target genome is covered by many clones. The total length of the clones divided by the length of the genome is called the clone coverage (10 in this example).
$\fbox{\epsfig{figure=lec10_fig/gfig10-5-1.eps,width=13cm}}$

The location of the clones along the target genome is not directly known to the experimenters. Mapping data (such as hybridization data) produced by the experiment is used to reconstruct the map. A list assigning every clone its estimated position along the genome is a solution to the mapping problem. According to equation 10.7 in theorem 10.5, with sufficient coverage the whole map is usually one contig. A plot of clone order in the constructed map vs. real clone position (see figure 10.9) provides a visual measure for map quality. If the order of the clones in the constructed map is completely correct then the computed left endpoints of clones increase as their true value increase (or decrease, if the orders in the true and constructed maps happen to be reversed, as in the examples of figures 10.8,10.9). Minor ordering errors are seen as small deviations from the monotonicity, as in figure 10.8, show the construction is still essentially correct. Very small errors, which do not change the clone order, cannot be observed from the plot. A completely random solution will correspond to randomly placed clones, whereas a nonrandom solution containing several large errors will translate into several randomly placed broken contigs with an approximately correct intra - contig order.

Next: The Statistical Model Up: Constructing Physical Maps from Previous: Constructing Physical Maps from

Itshack Pe`er
1999-03-21