Software downloads

Note that this page remains for historical reasons, and that the software now has a new home at malde.org.

RSelect

Select random sequences from a FASTA file. Usage: rselect m [n] file.seq to select m sequences. If you supply the n parameter, it will pick from the first n sequences, if not, it will count sequences in a separate pass, and select from all of them (but be slower).

I can select 100K random sequences from a set (human ESTs) of 7 million in about two minutes. I've a report that the binary is slow under emulation on IA64, possibly due to mixing of code and data on the same page. It should still work, though.

It's available from this Darcs repo, you also need the bio library. A linux binary is here.

RBR

Mask repeats in EST data sets without relying on a library of known repeats. Important especially for 'novel' organisms, where the genome is less known. Darcs repo here, and you'll need the bio library

rbr-0.7.tgz - 2006-10-18

cluster_tools

Includes programs to filter, extract, etc from clusterings and/or FASTA-files. See the README for details, visit the darcs repo, or just download binaries. (2006-01-29)

xsact/xtract/etc

Version 1.5, released 2006-01-31. Contains both xsact and xtract. The former has been made more robust and a bit faster, and has some new output types that I needed, but which you probably don't care about. Also updated to work with recent GHC versions. Darcs repo here.

Mail me if you want other binaries.

xsact-1.5.tgz - source code
xsact-1.5-1.i386.rpm - binary RPM
xsact-1.5-1.src.rpm - source RPM

Version 1.4, released 2004-04-07. Adds some new features, including parallelizing and new output modes.

xsact-1.4.tgz - source code
xsact-1.4-x86.tgz - binary for Linux/x86 (statically linked)
xsact-1.4-sparc.tar.gz - binary for Solaris/Sparc
more old versions

Version 1.3 released, 2003-08-15 with some bugfixes and improvements, see the README. Linux binary only at the moment.

Version 1.2 (released 2003-08-05) is safer, and detects and ignores repetitive sequences that otherwise would cause a mess. No Solaris version at the moment (until I get a newer compiler installed). WARNING: won't work too well on large inputs, use 1.3 instead!

Version 1.1 (released 2003-05-26) is faster, and included external sorting, which makes it possible to process larger amounts of ESTs. (Changes from 1.0 is only a plugged memory leak)

clusqual.py

clusqual.py - quick and dirty comparison of clusterings using the Jaccard index.

the benchmark data file

b10000-masked.seq.gz - lifted from SANBI, masked with RepeatMasker. This is necessary to test xsact.

Ketil Malde

Last modified: Fri Mar 10 15:43:17 CET 2006