Tree Parsers

The interface TreeParser is used to parse in and out gene trees. The Newick and Schreiber format is implemented. It is a redundant interface when it is using streams as well as strings, but since both are used often it is worth it. If you want to create your own parser, extend AbstractTreeParser and the abstract tag found in the file def.xml. For more information see section Create a Tag Instance.

The Newick Format

TreeParserNewick is a parser that supports the Newick format. There are three different versions defined, but you can easily define new ones by extending any of them and change their attributes. Following definitions are taken from the def.xml file. Take a look at the Quick Start for Jooc to find out how attributes and heritage work.


    <tree_parser did="abstract" eid="instance::linked, interface"
                  class="softparsmap.AbstractTreeParser"
                  abstract="yes"
                  encoding="@@string:=iso-8859-1"
                  />

    <tree_parser did="newick" eid="abstract"
                  class="softparsmap.TreeParserNewick"
                  instance_holder="singleton"
                  edge_type="@@instance||edge_type::interface:=unknown"
                  markers_begin="@@string:={"
                  markers_end="@@string:=}"
                  recursive_begin="@@string:=("
                  recursive_end="@@string:=)"
                  recursive_child_separator="@@string:=,"
                  marker_edge_value="@@string:={value}"
                  marker_leaf_label="@@string:={label}"
                  template_node_data="@@string:="
                  template_leaf="@@string:={label}"
                  before_body="@@string:="
                  after_body="@@string:=;"
                  />

    <tree_parser did="newick_edge_value" eid="newick"
                  template_node_data="{value}"
                  template_leaf="{label}"
                  />

    <tree_parser did="newick_edge_value_leaves_too" eid="newick"
                  template_node_data="{value}"
                  template_leaf="{label}:{value}"
                  />
      
The parser named newick parse trees without any edge data "((1, 2), (3, 4))", the newick_edge_value parser parse trees with internal edge data "((1, 2):100.0, (3, 4):95.0)", and the newick_edge_value_leaves_too parser parse trees with internal edge values as well as leaf edge data "((1:100.0, 2:100:0):100.0, (3:100.0, 4:100.0):95.0)". Here is the list of important attributes and what they control.

The Schreiber Format

This parser is divided up into a tree structure parser and a node converter. This is done in order to make it more flexible by allowing combinations of these two.

As of now there are three structures supported. The most common one is that of type [[1, 2], [3, [1]]] which parse the tree structure and call the node converter on every internal node and leaf.


    <tree_structure_parser did="schreiber" eid="instance::linked, interface"
                           class="softparsmap.TreeStructureParserSchreiber"
                           instance_holder="singleton"
                           before_core="@@string:=["
                           after_core="@@string:=]"
                           left_node_marker="@@string:=["
                           right_node_marker="@@string:=]"
                           node_divider="@@string:=,"
                           child_node_divider="@@string:=,"
                           />
      
Combining this structure parser with the node converter

    <node_converter did="schreiber_gene_node_label" 
                    eid="string_template_gene_tree"
                    class="softparsmap.StringGeneNodeConverterSchreiberLabel"
                    edge_type="@@instance||edge_type::interface:=unknown"
                    left_linked_node="@@string:=["
                    right_linked_node="@@string:=]"
                    />
      
creates the standard Schreiber tree parser

    <tree_parser did="schreiber_gene_label" eid="dual"
                 tree_structure_parser="schreiber"
                 node_converter="schreiber_gene_node_label"
                 />
      
There are two more structure parsers and six more node converters, creating almost 7*3=21 combinations. Almost, because some combinations is not valid. For more information see the def.xml.