A Constraint Based Structure Description Language for Biosequences

Ingvar Eidhammer, David Gilbert, Inge Jonassen, Madu Ratnayake

Dept. of Informatics, Univ. of Bergen, Reports in Informatics no 133, May 1997.
Also: Report 1997/04 from Dept. of Computer Science, City University, London, UK.


We report an investigation into how constraint solving techniques can be used to search for patterns in sequences (or strings) of symbols over a finite alphabet. We define a constraint-based structure description language for biosequences, and give the definition of an algorithm to solve the structure searching problem as a CSP. The methodology which we have developed is able to describe the two-dimensional structure of biosequences, such as tandem repeats, stem loops, palindromes and pseudo-knots. We also report on an implementation of the language in the constraint logic programming language clp(FD), with test results of a simple searching algorithm, and ideas for an implementation of the CSP structure searching algorithm in C++.

Keywords: constraints, biostructures, description language, searching.

The complete technical report is available.

The software (both web-based and stand-alone) is also available.