Motivation

Next: Similarity and Difference Up: Problem Definition and Biological Previous: Problem Definition and Biological

Motivation

A large variety of the biologically motivated problems in computer science primarily involve sequences or strings. For instance:

Reconstructing long sequences of DNA from overlapping string fragments.
Determining physical and genetic maps from probe data under various experiments protocols.
Storing, retrieving and comparing DNA strings.
Comparing two or more strings for similarities.
Searching databases for related strings and substrings.
Exploring frequently occurring patterns of nucleotides.
Finding informative elements in protein and DNA sequences.

Many of these research problems aim at learning about functionality or the structure of protein without performing any experiments and actually without having to physically construct the protein itself. The basic idea is that similar sequences produce similar proteins. Thus, in order to predict the characteristics of a protein using only its sequence data, we can use the structure/function information on known protein with similar sequences available in databases.

For instance, when considering protein folding, it usually suffices that two protein sequences are identical at 25% of their positions for their three dimensional structures to be almost identical. Classical example is the establishment of an association between cancer and uncontrolled cells growth [1]. This discovery was enabled by comparing the sequence of a cancer associated gene against the sequence of protein which had already been known to be influed cell growth. The correlation between these two sequences was very high, proving the connection between cancer and cellular growth.

Next: Similarity and Difference Up: Problem Definition and Biological Previous: Problem Definition and Biological

Itshack Pe`er
1999-01-03