Next: Models for Inexact Matching
Up: Pairwise Alignment
Previous: Motivation
Similarity and Difference
The resemblance of two DNA sequences taken from different organisms can be explained by the theory that all contemporary genetic material has one ancestral
ancient DNA. According to this theory, during the course of
evolution mutations occurred, creating differences between
families of contemporary species. Most of these changes are due to
local mutations, each modifying the DNA sequence at a specific
manner. These local modifications between nucleotide sequences, or
more generally, between strings over an arbitrary alphabet can be
either:
- Insertion - an insertion of a letter or several letters to the sequence.
- Deletion - deleting a letter (or more) from the sequence.
- Substitution - replacing a sequence letter by another.
Insertion and deletion are the reverse of one another: given two
sequences, if the insertion of a character (or more) into one
yields the other, then equivalently its deletion from the latter
sequence transforms it to the first one. Due to this reciprocity
between insertion and deletion, they are usually called indel for short.
The notion of distance derives its definition from the
concept of mutations: by assigning weights to each mutation.
Given two strings, the distance between them is the minimal
sum of weights for a set of mutations transforming one into the
other.
The notion of similarity derives its definition from the
concept of one ancestral ancient DNA: by assigning weights
corresponding for resemblance. Given two strings the similarity between them is the maximal sum of such weights.
Next: Models for Inexact Matching
Up: Pairwise Alignment
Previous: Motivation
Itshack Pe`er
1999-01-03