next up previous
Next: The HCC Algorithm Up: cDNA Clustering Previous: Motivation

   
The Experimental Problem

Recall that a gene is transcribed into mRNA, which is then translated into a protein. In order to check what genes are expressed in a given tissue, we use cDNA - a reverse-transcript of the mRNA, which is more stable. There exists methods which enable us to extract cDNA in large quantities from the tissue, and we can see, at a given moment, which cDNA molecules exist in the tissue (details are omitted). In fact, we sample cDNA molecules from the tissue. The more a gene is expressed, the more samples of its matching cDNA we will find. The sample we have obtained contains about 100,000 cDNA fragments, each of them between 500 and 2,500 base-pairs long, the average being around 1,200. Since the sampling was performed in the course of the transcription process, not all the cDNA fragments we have from the same gene will be of the same length, but rather they will all have a common endpoint (which is the starting point of the mRNA). We can now formulate the problem we face:

Problem 12.4   Determining gene expression
INPUT: Unsequenced cDNA fragments from a tissue
GOAL: Find which genes are present, and in what abundance

The simple solution is to sequence all the cDNA fragments we have extracted from the tissue. This is both wasteful and slow. We have extracted a very large quantity of cDNA, and many fragments come from the same genes. Sequencing all of them will mean sequencing the same genes over and over again.
next up previous
Next: The HCC Algorithm Up: cDNA Clustering Previous: Motivation
Itshack Pe`er
1999-03-16