Next: Similarity and Difference
Up: Problem Definition and Biological
Previous: Problem Definition and Biological
Motivation
A large variety of the
biologically motivated problems in computer science primarily
involve sequences or strings. For instance:
- Reconstructing long sequences of DNA from overlapping string
fragments.
- Determining physical and genetic maps from probe data under
various experiments protocols.
- Storing, retrieving and comparing DNA strings.
- Comparing two or more strings for similarities.
- Searching databases for related strings and substrings.
- Exploring frequently occurring patterns of nucleotides.
- Finding informative elements in protein and DNA sequences.
Many of these research problems aim at learning about functionality or the structure of protein without performing any experiments and actually without having to physically construct the protein itself.
The basic idea is that similar sequences produce similar proteins. Thus, in order to predict the characteristics of a protein using only its sequence data, we can use the structure/function information on known protein with similar sequences available in databases.
For instance, when considering protein folding, it usually suffices that two protein sequences
are identical at 25% of their positions for their three dimensional structures to be almost identical.
Classical example is the establishment of an association between cancer and uncontrolled cells
growth [1]. This discovery was enabled by comparing the sequence of a cancer associated gene
against the sequence of protein which had already been known to be influed cell growth. The correlation
between these two sequences was very high, proving the connection between cancer and cellular
growth.
Next: Similarity and Difference
Up: Problem Definition and Biological
Previous: Problem Definition and Biological
Itshack Pe`er
1999-01-03