Methods for finding motifs in sets of related biosequences
Dr. scient thesis
Dept. of Informatics,
University of Bergen,
The automatic discovery of patterns conserved in groups of related
biological sequences is an important problem in molecular biology.
This thesis discusses this problem, and presents a systematisation
of a large number of reported methods.
New methods for the automatic discovery of patterns and collection of
patterns in sets of unaligned protein sequences, are proposed.
The methods are able to discover patterns of a quite general type,
and are guaranteed to find the best, according to a defined evaluation
function, conserved patterns.
Both non-heuristic and heuristic search methods are proposed.
The problem of evaluating discovered patterns is discussed
and several new evaluation functions are proposed.
The new functions are shown to have useful properties for a
set of test cases.
The methods proposed in this thesis have been primarily
designed for analysing protein sequences, but they
may also be applicable to the analysis of nucleotide
(DNA/RNA) sequences and possibly other types of sequence data.
bioinformatics, protein sequences, pattern discovery, machine learning,
search methods, PROSITE, minimum descript length principle
The thesis consists of:
An introductory part - full text available in
Research papers (more information about these on my
Approaches to the automatic discovery of patterns in biosequences.
Alvis Brazma, Inge Jonassen, Ingvar Eidhammer, David Gilbert.
Finding flexible patterns in unaligned protein sequences.
Inge Jonassen, John F. Collins, Desmond Higgins.
Efficient discovery of conserved patterns using a pattern graph.
Scoring function for pattern discovery programs taking into account sequence diversity.
Inge Jonassen, Carsten Helgesen, Desmond Higgins.
Discovering patterns and subfamilies in biosequences
Alvis Brazma, Inge Jonassen, Esko Ukkonen, Jaak Vilo
See also the entry in the University's Dissertation Database
Inge's Home page.