DONE: IntMap is *much* faster, 
   - rewrite to switch between Map Integer Int, and IntMap Int

DONE: Sequences are parsed and passed around.  With lazy bytestrings,
   this is (memory-)inefficient, and we should instead stream over the file
   multiple times.  Also, we could build partial indices (for
   different word-prefixes) and prune less interesting bits (rare parts,eg)

TODO: Check out options for freqtable data structure.
      Things to try out: fmindex/afi, hashtable, accumArray, HsJudy
      And perhaps combining key and count into an Int(|eger|64)?

----------------------------------------

DONE: support arbitrary length keys
  (fall back Int32, Int64, Integer)

DONE: optimize, entering only one (minimum) of w (revcompl w)

DONE: trap exceptions from parsing, and fall back to "trivial"
      - by eliminating complicated parsing

DONE: auto-limit heap to 80% physical
  (but no go on CentOS :-( )

TODO: support shaped keys

DONE: support sparse keys (every nth)
    try to fit new sequences to old keys (add to score number of
    unregistered positions?)

DONE: gap closing

TODO: Repeat ID: 
	output report (.tbl, .out)
	* calculate 1..k'th order entropy
	* other?

TODO: Mask against library

TODO: three-pass: build FT, build library, mask against it

TODO: calculate distrib and mask over windows (w=200? 400?)
   avoids different treatment of different length sequences

- Clustering

- Clustering with (SG/Lee) assembly

 -> statistics to use when clustering:
    1. mode of word counts distribution (= coverage)
    2. estimated p value (1-var/mu) (= avg. overlap)