Next: Gap penalties types
Up: Gap Penalty
Previous: Gap Penalty
Motivation
The concept of a gap in an alignment is
important in many biological application, because the insertion or
deletion of an entire subsequence often occurs as single mutational
event. Moreover, many of these single mutational events can create
gaps of quite varying sizes. At the protein level, two protein
sequences might be relatively similar over several intervals but
differ in intervals where one contains a protein subunit that the
other does not.
One concrete illustration of the use of gaps in the alignment
model comes from the problem of cDNA matching [2] (chapter 11).
In this problem, one sequence is much longer than the other, and the alignment best
reflecting their relationship should consist of a few regions of
very high similarity interspersed with 'long' gaps in the shorter
sequence. Note that the matching regions can have mismatches and
spaces, but these should amount only to a small fraction of the region.
Recall that an RNA molecule is transcribed from the DNA of a gene. That RNA transcript is a complement of the
gene in DNA where each A in the gene is replaced by U in the RNA, each T is replaced by A,
each C by G, and each G by C. Moreover, the RNA transcript spans the entire gene, introns as
well as exons. Then in a process that is not complely understood, each introns-exon boundary in
the transcript is located, the RNA regions corresponding to the introns are spliced out, and the RNA
regions corresponding to exons are concatenated. The resulting RNA
molecule is called the messenger RNA (mRNA): it leaves the cell nucleus and is used
to create the protein it encodes.
Each cell (usually) contains a copy of all the chromosomes and hence, of all the genes of the
entire individual, yet in each specialized cell (a liver cell for example) only a small
fraction of the genes are expressed. That is, only a small fraction of the proteins encoded in the genome are actually produced in that specialized cell. A standard method to determine which proteins are expressed in the specialized cell line, and to hunt for
the location of the encoding genes, involves capturing the mRNA in that cell after it leaves the
cell nucleus. That mRNA is then used to create a DNA sequence complementary to it. This sequence
is called cDNA (complementary DNA). Compared to the original gene, the cDNA sequence
consists only of the concatenation of exons in the gene. After cDNA is obtained, the problem is
to determine where the gene associates with that cDNA resides, and it becomes one of aligning
the cDNA sequence against the longer DNA sequence in a way that reveals the exons.
Next: Gap penalties types
Up: Gap Penalty
Previous: Gap Penalty
Peer Itsik
2000-11-20