Relations Patterns and their Automatic Discovery in Biosequences

A. Brazma, I. Jonassen, I. Eidhammer, E. Ukkonen

Submitted to CABIOS


Experiments

PROSITE families

Outline of experiment:

The figure illustrates the main outline of the experiments done with the PROSITE families:

Results:

Relation patterns found for the PROSITE families using the PROSITE patterns. Restrictions applied, and no noise assumed:
  1. C4=10
  2. C4=15
  3. C4=20
Relation patterns found for the PROSITE families using patterns as found by Pratt. Restrictions applied, and no noise assumed.
  1. C4=15
  2. C4=20

Homeodomain family

We analysed the 1enh entry found in the HSSP database. We took away the last column in the alignment (which consists mostly of gaps) and then removed all sequences that contain at least one gap. Then a pattern was constructed by making one pattern position for each column of the alignment, the pattern position matching all amino acids found in that particular column:

[EGKPRSW]-[ACEGHIKLMNPQRSTVY]-[CGKLQRSY]-[ACHIKMPQTV]-[ADFIKLNPRSTV]-[ACFHILNY]- [HKNST]-[ADGHKLNPQRSTVY]-[ADEFHKLNPQRSTVWY]-[AQSY]-[AILRTV]-[ACDEFGIKLPQRSTVY]-[AEGHIKLQRSTV]- [LM]-[EKNQ]-[ACEGHIKLMNQRSTV]-[AEFHIKLQRSTVY]-[FY]-[ACDEFHIKLNQRSTY]-[ACDEFGHIKLMNQRSTVY]- [ADEGHKNQST]-[AEGHKMNPQRSV]-[FHKNTY]-[ILMPV]-[ACDEGMNSTVY]-[ACEFGIKLPRSVY]- [ACDEFGHKLNPQRSTVY]-[ADEHIKMQRTV]-[AKLRW]- [ACDEFHIKLMQRSTVWY]-[ACDEGHIKLMNQRSTV]-[FILMVY]-[ARS]-[ADEGHKLMNQRSTV]-[ADEFGHIKLMNQRSTVY]- [AILSTV]-[ACDEGHKMNQRSTVY]-[ILM]-[ACDGHLNPQRST]-[ADEKMPQST]-[ACDEKNQRSTVY]-[HKNQRTV]-[FILV]- [AEKQRT]-[FILSTV]-W-[FY]-[KQS]-N -[AHKNR]-[ARS]-[AIMNRSTVY]-[KQR]

This pattern together with the alignment was input to the algorithms A1-A4 described in the paper. These tables show some of the results obtained.


Poster presented at RECOMB 97

More papers.