Note that this page remains for historical reasons, and that the software now has a new home at malde.org.
Select random sequences from a FASTA file. Usage: rselect m [n] file.seq to select m sequences. If you supply the n parameter, it will pick from the first n sequences, if not, it will count sequences in a separate pass, and select from all of them (but be slower).
I can select 100K random sequences from a set (human ESTs) of 7 million in about two minutes. I've a report that the binary is slow under emulation on IA64, possibly due to mixing of code and data on the same page. It should still work, though.
It's available from this Darcs repo, you also need the bio library. A linux binary is here.
Mask repeats in EST data sets without relying on a library of known repeats. Important especially for 'novel' organisms, where the genome is less known. Darcs repo here, and you'll need the bio library
Includes programs to filter, extract, etc from clusterings and/or FASTA-files. See the README for details, visit the darcs repo, or just download binaries. (2006-01-29)
Version 1.5, released 2006-01-31. Contains both xsact and xtract. The former has been made more robust and a bit faster, and has some new output types that I needed, but which you probably don't care about. Also updated to work with recent GHC versions. Darcs repo here.
Mail me if you want other binaries.
Version 1.4, released 2004-04-07. Adds some new features, including parallelizing and new output modes.
Version 1.3 released, 2003-08-15 with some bugfixes and improvements, see the README. Linux binary only at the moment.
Version 1.2 (released 2003-08-05) is safer, and detects and ignores repetitive sequences that otherwise would cause a mess. No Solaris version at the moment (until I get a newer compiler installed). WARNING: won't work too well on large inputs, use 1.3 instead!
Version 1.1 (released 2003-05-26) is faster, and included external sorting, which makes it possible to process larger amounts of ESTs. (Changes from 1.0 is only a plugged memory leak)