Motivation

Next: Similarity and Difference Up: Problem Definition and Biological Previous: Problem Definition and Biological

Motivation

A large variety of the biologically motivated problems in computer science primarily involve sequences or strings. For instance:

Reconstructing long sequences of DNA from overlapping sequence fragments.
Determining physical and genetic maps from probe data under various experiments protocols.
Storing, retrieving and comparing DNA sequences.
Comparing two or more sequences for similarities.
Searching databases for related sequences and subsequences.
Exploring frequently occurring patterns of nucleotides.
Finding informative elements in protein and DNA sequences.

Many of these research problems aim at learning about functionality or the structure of protein without performing any experiments and actually without having to physically construct the protein itself. The basic idea is that similar sequences produce similar proteins. Thus, in order to predict the characteristics of a protein using only its sequence data, we can use the structure/function information on known protein with similar sequences available in databases. For instance, when considering protein folding, it usually suffices that two protein sequences are identical at 25% of their positions for their three dimensional structures to be almost identical. Classical example is the establishment of an association between cancer and uncontrolled cell growth [1]. This discovery was enabled by comparing the sequence of a cancer associated gene against the sequence of proteins which had already been known as influencing the cell growth. The correlation between these two sequences was very high, proving the connection between cancer and cellular growth.

Next: Similarity and Difference Up: Problem Definition and Biological Previous: Problem Definition and Biological

Peer Itsik
2000-11-20