Next: Similarity and Difference
Up: Problem Definition and Biological
Previous: Problem Definition and Biological
A large variety of the
biologically motivated problems in computer science primarily
involve sequences or strings. For instance:
- Reconstructing long sequences of DNA from overlapping sequence
fragments.
- Determining physical and genetic maps from probe data under
various experiments protocols.
- Storing, retrieving and comparing DNA sequences.
- Comparing two or more sequences for similarities.
- Searching databases for related sequences and subsequences.
- Exploring frequently occurring patterns of nucleotides.
- Finding informative elements in protein and DNA sequences.
Many of these research problems aim at learning about functionality or the structure of protein without performing any experiments and actually without having to physically construct the protein itself.
The basic idea is that similar sequences produce similar proteins. Thus, in order to predict the characteristics of a protein using only its sequence data, we can use the structure/function information on known protein with similar sequences available in databases.
For instance, when considering protein folding, it usually suffices that two protein sequences
are identical at 25% of their positions for their three dimensional structures to be almost identical.
Classical example is the establishment of an association between cancer and uncontrolled cell
growth [1]. This discovery was enabled by comparing the sequence of a cancer associated gene
against the sequence of proteins which had already been known as influencing the cell growth. The correlation
between these two sequences was very high, proving the connection between cancer and cellular
growth.
Next: Similarity and Difference
Up: Problem Definition and Biological
Previous: Problem Definition and Biological
Peer Itsik
2000-11-20