next up previous
Next: Finding Genes in Prokaryotes Up: Gene Finding Previous: Motivation

Biological Background

Gene expression is the biological process by which a DNA sequence generates a protein. It involves two steps: transcription and translation. Transcription produces a mRNA (messenger RNA) sequence using the DNA sequence as a template. The mRNA sequence produced is complementary to the DNA strand which was used as template. The subsequent process, called translation, synthesizes the protein according to information coded in the mRNA. This process is performed by sub cellular elements called ribosomes (Figure [*]).

  
Figure: DNA $\rightarrow$ RNA $\rightarrow$ Protein [2].

The transcription is carried out from the 5' end to the 3' end of the copied DNA strand (from 3' to 5' of the complementary, template strand). This direction along the strand is called downstream while the opposite direction is called upstream. The enzyme performing the transcription, RNA polymerase, starts transcription a few bases upstream of the region that actually codes for a protein, and terminates a few bases after the end of that coding region. The regions in both ends of the DNA coding region which are transcribed into mRNA, but do not code the protein are called untranslated regions (UTR) (see Figures [*] and [*]). RNA polymerase molecules start transcription by recognizing and binding to promoter regions upstream of the desired transcription start sites. These promoter regions control the rate of gene expression.
Proteins are composed of amino acids. The ribosomes produce sequences of amino acids by translating the information coded in the mRNA sequences. Each triplet of bases in the mRNA is a command for the ribosomes, called codon. There are 64 different possible codons and only 20 amino acids, thus multiple codons represent the same amino acid. Mapping from codons to amino acids, called the genetic code, is shown in Figure [*]. One of the codons, called start codon, indicates the beginning of translation (as well as coding for the amino acid Methonine), and three, called stop codons, indicate end of translation. The ribosome scans the mRNA molecule, sliding along it from its 5' end to its 3' end. Upon detecting a start codon the ribosome starts generating an amino acid sequence coded by the mRNA. The process stops when that ribosome detects a stop codon.
  
Figure: The genetic code. AUG us the start codon, while UAA, UAG and UGA are the stop codons [6].



next up previous
Next: Finding Genes in Prokaryotes Up: Gene Finding Previous: Motivation
Peer Itsik
2000-12-25