Gene and Species Trees

Tree nodes and trees in Softparsmap are represented by five classes, Node, GeneNode, GeneLeaf, SpeciesNode, and SpeciesLeaf. These classes are linked together in order to create gene and species trees. The class names reflect if the node is an internal node or a leaf. Species leaves and gene leaves are mapped (one-to-many) since each gene sequence (represented by the class GeneLeaf ) is found in a species (represented by the class SpeciesLeaf). The species tree is a subtree of the Tree of Life such that every species represented by the species leaves harbor at least one gene sequence found in the gene tree. Before it is possible to map gene nodes onto species nodes in the species tree, every gene leaf has to be added to one species leaf.

A reference to a node has two different meanings depending on the context. Either the reference is used in the context of the node alone, or in the context of a rooted subtree. For instance

  Set children = someNode.getChildren();
returns the children of someNode, while
  Set leaves = someNode.getLeaves();
returns the set of leaves in the rooted subtree where someNode is the root.

Edge Types

Dependent on the method used when creating gene trees, different types of edges are created. The interface EdgeType defines the properties regarding edge types that this package needs. Every gene node has a reference to an instance of this interface. In order to create your own edge type, extend AbstractEdgeType and the abstract edge tag found in def.xml. For more information see section Create a Tag Instance.

StandardEdgeType is the standard edge type in Softparsmap and has following attributes.

For more information see section Create a Tag Instance.

Printing Trees

There are three methods in the class Node used to print trees to the prompt, toStringTree(), toStringTable(), and toStringAll(). These methods can be used to print species trees as well as gene trees. Each node in the tree has a label and the row in the table with the same label has all available data for that node. If a cell contains '-' it means that the data for that node is not available and if a column is missing it means that the whole tree is missing that data. Here is an example on what a species tree and a rooted gene tree looks like after inferring mutation.


       +--(9606)-
(-9347)|
       +-(10090)-
+-------+-------------+--------------------------------+-----------------+
| Label | Class       | Seq                            | Species name    |
+-------+-------------+--------------------------------+-----------------+
| -9347 | SpeciesNode |  -                             | placentals      |
| 10090 | SpeciesLeaf | [20809742, 15126606, 12860621] | transgenic mice |
| 9606  | SpeciesLeaf | [15012045, 16550688]           | man             |
+-------+-------------+--------------------------------+-----------------+
      
This is the species tree and the column Seq contains the sequences that exists in a certain species. The column Species name is the name of the species.

     +----------------(15126606)-
     |
     |             +--(12860621)-
     |      +--(-6)|
(-92)|      |      +--(16550688)-
     +--(-5)|
            |      +--(20809742)-
            +--(-4)|
                   +--(15012045)-
+----------+----------+---------+---------------+------+---------------+-------+
| Label    | Class    | E.v.    | M(g)          | N.i. | SL(g)         | m(g)  |
+----------+----------+---------+---------------+------+---------------+-------+
| -4       | GeneNode | 0.94-UN | [9606, 10090] |  -   | [10090, 9606] | -9347 |
| -5       | GeneNode | 1.0-UN  | [9606, 10090] | D1   | [10090, 9606] | -9347 |
| -6       | GeneNode | 0.92-UN | [9606, 10090] |  -   | [10090, 9606] | -9347 |
| -92      | GeneNode | NaN-UN  |  -            | D1L1 |  -            |  -    |
| 12860621 | GeneLeaf | 1.0-UN  |  -            |  -   | [10090]       | 10090 |
| 15012045 | GeneLeaf | 1.0-UN  |  -            |  -   | [9606]        | 9606  |
| 15126606 | GeneLeaf | 1.0-UN  | [10090]       |  -   | [10090]       | 10090 |
| 16550688 | GeneLeaf | 1.0-UN  |  -            |  -   | [9606]        | 9606  |
| 20809742 | GeneLeaf | 1.0-UN  |  -            |  -   | [10090]       | 10090 |
+----------+----------+---------+---------------+------+---------------+-------+	
      
This is the rooted gene tree after mutation has been inferred. The columns and what they stand for are as follows.