A Constraint Based Structure Description Language for Biosequences

Ingvar Eidhammer, David Gilbert, Inge Jonassen, Madu Ratnayake

Dept. of Informatics, Univ. of Bergen, Reports in Informatics no 133, May 1997.
Also: Report 1997/04 from Dept. of Computer Science, City University, London, UK.

Abstract:

We report an investigation into how constraint solving techniques can be used to search for patterns in sequences (or strings) of symbols over a finite alphabet. We define a constraint-based structure description language for biosequences, and give the definition of an algorithm to solve the structure searching problem as a CSP. The methodology which we have developed is able to describe the two-dimensional structure of biosequences, such as tandem repeats, stem loops, palindromes and pseudo-knots. We also report on an implementation of the language in the constraint logic programming language clp(FD), with test results of a simple searching algorithm, and ideas for an implementation of the CSP structure searching algorithm in C++.

Keywords: constraints, biostructures, description language, searching.

The complete technical report is available.

The software (both web-based and stand-alone) is also available.