Next: Signal models
Up: GENSCAN
Previous: Transition probabilities
State length
distributions
Different functional units on a gene have vastly
different lengths. For example, an average internal exon is about
150bp long, while introns of the order of 1Kbp length are not
uncommon. Thus, in our probabilistic model of gene structure,
different states need to have different length distributions.
Intron lengths are known to vary dramatically with the C+G
content. For example, the mean intron length for category I
( < 43% C+G) of the training set is 2069bp as opposed to only
518bp for category IV ( > 57 % C+G) (see figure
7.10). Thus, the program uses separate
distributions for intron states in each category. The learning set
shows quite different length distributions for initial exons,
internal exons and terminal exons. Consequently, different
distributions are used for them. It is important to note here is
that the length of an internal exon has to be consistent with the
phase of its adjacent introns. For example, if the preceding state
is I2 and the succeeding state is I1 , then the generated
internal exon length (for state E2 in this case) must be 3n+2
for some n. n is therefore generated randomly according to the
length distribution and then a string of length 3n+2 is
generated according to the string generating model for that state.
For the 5' UTR and 3' UTR states, geometric distributions with
mean values of 769bp and 457bp are used.
Next: Signal models
Up: GENSCAN
Previous: Transition probabilities
Itshack Pe`er
1999-02-03