Next: The HCC Algorithm
Up: cDNA Clustering
Previous: Motivation
The Experimental Problem
Recall that a gene is transcribed into mRNA, which is then translated into a
protein. In order to check what genes are expressed in a given tissue, we use
cDNA - a reverse-transcript of the mRNA, which is more stable. There exists
methods which enable us to extract cDNA in large quantities from the tissue,
and we can see, at a given moment, which cDNA molecules exist in the tissue
(details are omitted).
In fact, we sample cDNA molecules from the tissue. The more a gene is
expressed, the more samples of its matching cDNA we will find. The sample we
have obtained contains about 100,000 cDNA fragments, each of them between 500
and 2,500 base-pairs long, the average being around 1,200.
Since the sampling was performed in the course of the transcription process,
not all the cDNA fragments we have from the same gene will be of the same
length, but rather they will all have a common endpoint (which is the starting
point of the mRNA).
We can now formulate the problem we face:
Problem 12.4
Determining gene expression
INPUT: Unsequenced cDNA fragments from a tissue
GOAL: Find which genes are present, and in what abundance
The simple solution is to sequence all the cDNA fragments we have extracted
from the tissue. This is both wasteful and slow. We have extracted a very
large quantity of cDNA, and many fragments come from the same genes.
Sequencing all of them will mean sequencing the same genes over and over
again.
Next: The HCC Algorithm
Up: cDNA Clustering
Previous: Motivation
Itshack Pe`er
1999-03-16