BLAST concentrates on finding regions of high local similarity in alignments without gaps, evaluated by an alphabet-weight scoring matrix. We next define some fundamental objects concerning BLAST:
Given two strings S1 and S2, a segment pair is a pair of equal length respective substrings of S1 and S2, aligned without spaces.
A locally maximal segment pair is a segment pair whose alignment score (without spaces) can not be improved by extending it or shortening it.
A maximal segment pair (MSP) in S1 and S2 is a segment pair with the maximum score over
all segment pairs in S1, S2.
When comparing all the sequences in the database against the query, BLAST attempts to find all the database sequences that when paired with the query contain an MSP above some cutoff score S. We choose S such that it is unlikely to find a random sequence in the database that achieves a score higher than S when compared with the query sequence.
The stages in the BLAST algorithm are as follows:
We may implement the first stage by constructing, for each w-length word
in the query sequence, all the w-length
words whose similarity to
is at least t. We store these words in a data structure which is later accessed while
checking the database sequences.
It is usually recommended to set the parameter w to values of 3 to 5 for amino acids, and for nucleotides.
Although BLAST does not allow alignments with indels, it has been shown that with the correct selection of values to the parameters used by the algorithm, it is possible to obtain all the correct alignments while saving much of the computation time compared to the standard dynamic programming method.