MDL test case: four small PROSITE families

The families:

  1. TP1.
  2. TP2_1.
  3. SAR1.
  4. SPASE_II.
gives the sequence file. For comparison purposes, we used Clustal W to generate a multiple alignment, and a phylogenetic tree. (drawing generated using Phylip). The four families can be easily found from the tree.

The unaligned sequences were input to MDL-Pratt using parameters (c0,c1,c2,c3)=(10,10,3,10) and logarithmic gap penalty. MDL-Pratt automatically gave us the families back, with one conserved pattern for each of the four families:

  1. STP1_BOVIN STP1_HUMAN STP1_MOUSE STP1_PIG STP1_RAT STP1_SHEEP
    94.94172 187.60838 
    S-T-S-R-K-L-K-[TS]-[HQ]-G-[MT]-R-R-[GS]-K-[SN]-R-[TAS]-P-H-K-G-V-K-R-x(0,1)-G-x(0,1)-S-K-R-K-Y-R-K-[GS]-[VSN]-L-K-S-R-K-R-[CG]-D-D-A-[SN]-R-N
    
  2. SAR1_ARATH SAR1_SCHPO SAR1_YEAST SARA_MOUSE
    44.13033 172.88033 
    L-G-L-D-N-A-G-K-T-T-L-L-[HQ]-M-L-K-x(0,1)-D-x(0,1)-R-L-[VAG]-x-[LMH]-x-P-T-x-H-P-T-S-E-E-L-[TAS]-I-[AG]-[KGN]-[IVM]-[KTR]-F-[KT]-[TA]-F-D-L-G-G-H
    
  3. STP2_BOVIN STP2_HUMAN STP2_MOUSE STP2_PIG STP2_RAT
    24.94163 96.54163 
    K-[KN]-R-K-[TN]-[LVF]-E-G-K-[LVA]-x-K-[KR]-K-x-[IVA]-[KR]-R-x-[KQ]-[QR]-[VT]-[YH]-[KR]-[TAR]-x-[TR]-[QR]-[TS]-x(2)-[WR]-[KR]-x-[KN]
    
  4. LSPA_ECOLI LSPA_ENTAE LSPA_PSEFL LSPA_STAAU
    12.32081 125.07081 
    I-x(2,3)-L-[IVF]-[ILA]-G-x(0,1)-A-L-G-N-[LF]-[IFY]-D-R-[IL]-x(2)-G-[FHE]-V-[IV]-D-[MF]-I-x-[VFT]-x-[IVW]-x-[GDN]-[YWR]-[HD]-[FY]-[FAP]-[ITP]-[FA]-x(2)-[FA]-[AD]-[TSD]-[AS]-[ILA]-[ITC]-[IVT]-[VG]
    

Page compiled by: Inge Jonassen.

MDL test cases,