Pairwise Distances

Next: Distance between Proteins - Up: Distance Matrix Methods Previous: Distance Matrix Methods

Pairwise Distances

Given a measure of the distance between each pair of species, a simple approach to the phylogeny problem would be to find a tree that predicts the observed set of distances as closely as possible. This leaves out some of the information in the data matrix M, reducing it to a simple table of pairwise distances. However, it seems that in many cases most of the evolutionary information is conveyed in these distances. For the analysis in this section, we shall first need to define an additive continuous distance function, so that the distance between two species would be expected to be proportional to the total branch lengths between the species. Thus if species a and b are connected via two edges in the tree, with lengths d_av and d_bv (see figure 9.8), the distance between them would be d_av+d_bv. Furthermore, given the distances between three species - d_ab, d_ac, and d_bc, we could easily calculate the inner distances - d_av, d_bv, and d_cv, by solving a system of linear equations. Figure 9.8 illustrates a small tree, and table 9.2 contains the distances it predicts.

**Figure 9.8:** A small tree with 3 species - a, b, and c. The branch lengths correspond to the pairwise distances in table 9.2.
$\fbox{\epsfig{figure=lec09_figs/disttree.ps}}$

Table 9.2: Distances d_ij predicted by the tree in figure 9.8.

	a	b	c
a	0	0.08	0.45
b	0.08	0	0.43
c	0.45	0.43	0

We will give two examples of how distances may be computed to make them comply with our requirements - one for proteins, and another for DNA sequences.

Next: Distance between Proteins - Up: Distance Matrix Methods Previous: Distance Matrix Methods

Itshack Pe`er
1999-02-18