Next: Nomenclature
Up: Pairwise Alignment
Previous: Motivation
Similarity and Difference
The resemblance of two DNA sequences taken from different organisms can be explained by the theory that all contemporary genetic material has one common ancestral DNA. According to this theory, during the course of
evolution mutations occurred, creating differences between
families of contemporary species. Most of these changes are due to
local mutations, each modifying the DNA sequence at a specific
manner. These local modifications between nucleotide sequences, or
more generally, between strings over an arbitrary alphabet can be
either:
- Insertion - an insertion of a letter or several letters to the sequence.
- Deletion - deleting a letter (or more) from the sequence.
- Substitution - replacing a sequence letter by another.
Insertion and deletion are the reverse of one another: given two
sequences, if the insertion of a character (or more) into one
yields the other, then equivalently its deletion from the latter
sequence transforms it to the first one. Due to this reciprocity
between insertion and deletion, they are usually called
indel for short.
The notion of distance derives its definition from the
concept of mutations by assigning weights to each mutation:
Given two sequences, the distance between them is the minimal
sum of weights for a set of mutations transforming one into the
other.
The notion of similarity derives its definition from the
concept of one ancestral ancient DNA: by assigning weights
corresponding to resemblance. Given two sequences the similarity between them is the maximal sum of such weights.
Next: Nomenclature
Up: Pairwise Alignment
Previous: Motivation
Peer Itsik
2000-11-20