One of the more statistically justified methods to approximate a distance matrix
is the least squares approach. Basically, our goal is to find a tree T,
whose leaves are the n given species, and that predicts distances dij
between the species, so that the following expression is minimized:
(9.6)
where Dij is the observed distance between species i and j, and wij
are given weights. The SSQ is a measure of the discrepancy between the observed
distances Dij and the distances dij predicted by T. The weights
wij are usually all 1, or
.
Problem 9.10
Least Squares Tree.
INPUT: The distance Dij between species i and j, for each
,
and a corresponding set of weights wij.
QUESTION: Find the phylogenetic tree T, with the species as its leaves,
that minimizes SSQ(T).
In general, finding the least squares tree is an NP-complete problem [2].
We will discuss two polynomial heuristics - UPGMA and Neighbor-Joining. We have already
studied these algorithms in lecture #5, where we used them to iteratively add one additional
string to a growing multiple alignment, thus obtaining a progressive alignment.
Next:UPGMA Up:Distance Matrix Methods Previous:Distance between DNA SequencesItshack Pe`er 1999-02-18