next up previous
Next: Other Uses of the Up: Blocks Database and Tools Previous: Blocks Database and Tools

   
General View of How the Program Works

See [10], Many known proteins can be grouped into families according to functional and sequence similarities. The similarity of the proteins across the sequences in each family is far from uniform. While some regions are clearly conserved, others display little sequence similarity. Often the conserved regions are crucial to the protein's function, including, for example, enzymatic catalytic sites. Such conserved regions can be used to probe an uncharacterized sequence to indicate its function.
The description of a protein family by its conserved regions focuses on the family's characteristic and distinctive sequence features, thus reducing noise compared to alignments which handle all positions uniformly. Databases of conserved features of protein families can be utilized to classify sequences from proteins, cDNAs and genomic DNA. The database was constructed from sequences of protein families using a fully automated procedure. Searching the Blocks database with a sequence query allows detection of one or more blocks representing a family. A best set of blocks representing each protein group is found automatically by the two-step PROTOMAT system [3]. The first step incorporates a motif finder. Currently there is use of MOTIF algorithm [11]. MOTIF exhaustively evaluates spaced triplets of amino acids that are common to multiple sequences. There is also implemented a Gibbs sampling motif finder that iteratively optimizes random `seeds' for blocks [6]. The MOTIF and Gibbs algorithms generate similar block sets for the sequences used in the Blocks Database. The second step of the PROTOMAT system combines and refines the original blocks and assembles a optimal set of blocks that is consistently found in most of the sequences in the group.
next up previous
Next: Other Uses of the Up: Blocks Database and Tools Previous: Blocks Database and Tools
Itshack Pe`er
1999-01-17