BLAST Score

Next: PSI BLAST Up: Significance of Scores Previous: FastA scores

BLAST Score

BLAST scores rely on extensive theory. We start by making the following assumptions:

The BLAST score is scoring local ungapped alignments. The theory of scoring here is well understood.
The database sequences are assumed to be evolutionary unrelated, i.e. independent of one another.
The alignment starts at specific positions along query and database record.
The score matrix must give, on the average, a negative $\sigma$ (a,b) score. Were this not the case, long alignments would tend to have high score independently of whether the segment aligned were related, and the statistical theory would break down.

**Figure 5.10:** Random walk: The score for a match is +2 and the punishment for a missmatch is -1, As shown,the expectancy for the whole walk is negative. The probability that the Top Score will be larger than X decreases exponentially with x.
$\includegraphics[width=11cm]{lec05_picturs/slide6I.eps}$

When searching a query of length m in a database of total length n one performs m*n random walk experiment, each with exponentially decreasing probability of achieving a score S. Thus, the E-value for score s is: $KmnE^{-\lambda S}$ . $\lambda$ and K are constants:

$\lambda$ - scaling factor
K - correction for dependency and bias of the scoring scheme.

Indeed the E-score is normalized by the length of the query and database: The same alignment would have different E-score if these length are different. Also the E-score is exponential, thus it is instructive to consider a normalization of the E-score into logarithmic scale, called the Bit - score.

The Bit-score B is computed from the E-score E by E=mn2^-B. Obviously, the Bit-score is linear in the raw score s: $B=\frac{\lambda S - ln(K)}{ln(2)}$ .
In contrast to raw scores, that have little meaning without k and $\lambda$ , the Bit-score is measured in standard units (see eg. [17]). Naturally, the meaning of the Bit-score depends on sizes of the query and the database.

Again, as mentioned before one can ask for the P-value (the probability of the observed number of records with a known E-value or lower).
Define the random variable Y to be the observed number of pairs achieveing E-value E or better(smaller).

Y is distributed Poisson with (E). The Probability of Y_e to be r is ${\frac{exp(-E)E'}{r!}}$ , and the probability of Y_e to be 0 is equivilant to the probability that the (Best E-score < E)=exp (-E). Specifically the chance of finding zero alignments with score >= S is e^-E so the probability of finding at least one such alignment is 1-e^-E . This is the P-value associated with the score S (see eg. [17]). Note that this model assumes an I.I.D trial for each database position.

Next: PSI BLAST Up: Significance of Scores Previous: FastA scores

Peer Itsik
2000-12-11