Finding flexible patterns in unaligned protein sequences.
Inge Jonassen, John F. Collins, Desmond Higgins
Protein Science 1995;4(8):1587-1595
Communication to:
Inge Jonassen,
Department of Informatics,
University of Bergen
HIB
N5020 Bergen
Norway
e-mail: inge@ii.uib.no.
Keywords:
flexible gaps, patterns, protein families, PROSITE.
Abstract:
We present a new method for the identification of
conserved patterns in a set of unaligned related
protein sequences. It is able to discover patterns of a
quite general form, allowing for both ambiguous
positions and for variable length wildcard regions. It
allows the user to define a class of patterns (e.g.,
the degree of ambiguity allowed and the length and
number of gaps), and the method is then guaranteed to
find the conserved patterns in this class scoring
highest according to a significance measure defined.
Identified patterns may be refined using one of two new
algorithms. We present a new (nonstatistical)
significance measure for flexible patterns. The method
is shown to recover known motifs for PROSITE families
and is also applied to some recently described families
from the literature.