This page is the supportive web-site for the paper:
Searching the protein structure databank with weak sequence patterns
and structural constraints
Inge Jonassen1, Ingvar Eidhammer1, Svenn H. Grindhaug1, William R. Taylor1,2
J. Mol. Biol. 304 (4), 597-617.
1Dept. of Informatics, University of Bergen, Norway
2Division of Mathematical Biology, Natl. Inst. for Medical Research, London, UK
A method is described in which proteins that match PROSITE patterns
are filtered by the local root-mean-square deviation of the local 3D
structures of the probe and target over the pattern components.
This was found to increase the discrimination between true and false
members of the protein family but was dependent on how unique the
structural features in the pattern were compared to equivalent
fragments extracted from structure databank. (For example;
if the pattern fell in an alpha-helix, then discrimination was poor.)
We then generalised the sequence patterns (by widening the range of
amino acids allowed at each position) and monitored how well the
structural information helped retail specificity. While the discrimination
of the pure sequence pattern had generally disappeared at information
content values less than 10 bits, the discrimination of the combined sequence
structure probe remained high at this point before following a similar decay. The `gap' between these curves indicates that the structural component is,
on average, equivalent to about 10 bits.
The sequence patterns were also filtered using the structure
comparison program SAP, giving a global, rather than local `view'
of the proteins. This allowed the information content of the
sequence patterns to become even less specific but raised problems
of whether some proteins encountered with the same fold but no
PROSITE pattern should constitute family members.
The links below are to two pages, the first one contains a set of Prosite
families which have been analysed both using ComPat's (Prosite patterns
extended with a structural probe), SAPpat (Structure Alignment guided by
the Prosite patterns). The second link is to a list containing families
only analysed using ComPat's. The plots are clickable; clicking on a
point in a plot should give as a result a description of the corresponding
pattern (and probe, if applicable). Only results from a "pure pattern" and
ComPats are included in the list produced when a plot is clicked (no SAPpat
results of phi-BLAST results are given).
Page maintained by Inge Jonassen