Next: Prokaryotes
Up: Gene Finding
Previous: Motivation
Biological Background
Gene expression is the
biological process by which a DNA sequence generates a protein. It
involves two steps: transcription and translation.
Transcription produces a mRNA (messenger RNA) sequence using the
DNA sequence as a template. The mRNA sequence produced is
complementary to the DNA strand which was used as template. The
subsequent process, called translation, synthesizes the protein
from the mRNA. This process is performed by sub cellular elements
called ribosomes (Figure 7.2).
The transcription is carried out from the 5' end to the 3' end of
the DNA strand. This direction along the strand is called
downstream while the opposite direction is called upstream.
The enzyme preforming the transcription, RNA polymerase, starts
transcription a few bases upstream of the start codon and
terminates a few bases after the stop codon. The regions in both
ends of the DNA coding region which are transcripted into a mRNA,
but do not code the protein are called untranslated regions
(UTR) (see figure 7.4 and
7.5). RNA polymerase molecules
start transcription by recognizing and binding to promoter
regions upstream of the desired transcription start sites. These
promoter regions control the rate of gene expression.
Figure 7.1:
Steps in gene
expression
|
Figure 7.2:
mRNA translation: The polypeptide chains are
elongated as the ribosomes move along the mRNA molecules, with the
5' end of the mRNA being translated
first.
|
Figure 7.3:
The
genetic code. AUG us the start codon, while UAA, UAG and UGA are
the stop codons.
|
Since there are
64 different possible codons, and only 20 amino acids, multiple
codons represent the same amino acid. Besides those codons coding
amino acids, there is one, called start codon, that
indicates the beginning of translation (as well as code for the
amino acid Metionine), and three, called stop codons, that
indicate end of translation. The genetic code is shown in figure
7.3. Because the codons are triplets of
bases, any given DNA sequence can be interpreted in three possible
ways, depending on where the coding starts. These three ways are
called reading frames. An open reading frame (ORF) is
a sequence of codons with no stop codon.
Figure 7.4:
Typical
prokaryotic gene structure at DNA level (not to
scale)
|
Figure 7.5:
Typical eukaryotic gene structure at DNA level
(not to scale)
|
Next: Prokaryotes
Up: Gene Finding
Previous: Motivation
Itshack Pe`er
1999-02-03