We will present an algorithm by Sankoff [15] which is a generalization2 of the Fitch algorithm.
Sankoff's algorithm:
Step 1: We will compute, for each node v and each state t a quantity St(v) which is the minimum cost of the subtree whose root is v given vc = t. The order of computation will be, as in step 1 of Fitch, postorder: For each leaf v:
(2) |
(3) |
Step 2: Based on the numbers Sct(v) calculated in step 1, we will now determine the optimal values for each character c in the internal nodes. We will traverse the tree in preorder this time:
For the root node r, we will choose
.
For any other node v, with parent node u, set:
Complexity: For every node we do O(k) work in each step, meaning per character. The algorithm should be applied once for each character, with a total complexity of .
Weighted Characters
It is possible to assign weights not only to state changes, but also to the characters themselves. Technically, this means assigning a number Wc to each character, and rewriting equation 8.5 to read:
(5) |
Where do we get the weights Wc? For instance, if we are working with a DNA sequence, and we know the reading frame, we can make use of the fact that changes in the third codon position are more frequent, since in many cases they don't change the amino acid coded.
In section 8.2.2 we will see another possible source for weights - compatible characters. In short, we will give more weight to characters which seem to fit the tree well than to characters which fit it poorly.