Next: Transition probabilities
Up: GENSCAN
Previous: GENSCAN
Initial
state probabilities
Figure 7.10:
Gene density and
structure as a function of C+G
composition.
|
The initial
probabilities of various states in the model should be
proportional to the frequencies with which various functional
units occur in the actual human genomic data. For example, if the
estimated proportion of the non-coding intergenic region is 80%,
then initial probability for the state N (see figure
7.9) must be around 0.8. But as a
matter of fact, the relative bulk of the various functional units
is found to vary considerably with the C+G content (isochore) of
the genomic sequence (see figure 7.10). Thus,
for training GENSCAN the training set is divided into four categories depending
on the C+G content of the sequence. The categories are:
- 0
- 1.
- ( < 43% C+G)
- 2.
- (43 -51% C+G)
- 3.
- (51 - 57% C+G)
- 4.
- ( > 57% C+G)
For each of these categories, separate initial state probabilities
are computed by estimating the relative frequencies of various
functional units in these categories.
Itshack Pe`er
1999-02-03