Scoring Function

Next: Promoter variation Up: Detection of Promoter Regions Previous: Positional Weight Matrix

Scoring Function

Let f_b denote the expected frequency of the base b in the genome (the background frequency). We calculate the likelihood of a given sequence being a TATA-box. For a sequence $S=B_1B_2\ldots B_6$ the likelihood of it being a TATA-box is:

$\begin{displaymath}P(S\vert S\mbox{ is a TATA-box})=\prod_{i=1}^{6}f_{B_i,i} \end{displaymath}$

Similarly, the likelihood of observing it, given it is a "non-promoter" is:

$\begin{displaymath}P(S\vert S\mbox{ is not a TATA-box})\approx P(S) = \prod_{i=1}^{6}f_{B_i} \end{displaymath}$

The log-likelihood ratio is therefore:

$\begin{displaymath}\log\left(\frac{P(S\vert\mbox{promoter})}{P(S\vert\mbox{non-p... ...ht)= \sum_{i=1}^{6}\log\left(\frac{f_{B_i,i}}{f_{B_i}}\right) \end{displaymath}$

This model has the disadvantage that it doesn't exploit all of the known information (i.e. dependencies between bases occurring in the promoter regions etc.) The f_{B_i} are given in Figure

**Figure:** Positional weight matrix for TATA box [].

Peer Itsik
2000-12-25