Families and Family Groups

Gene sequences are divided into groups in order to build more reliable gene trees. In Softparsmap this kind of group is referred to as a family. Families are also divided into groups and they are called family groups. A family is represented by the class Family and a family group by the interface FamilyGroup. Instances of this interface are responsible of providing everything needed to perform a task on a family, including the families. In order to create your own family group, extend AbstractFamilyGroup and the abstract family group tag found in def.xml. For more information see section Create a Tag Instance.

FamilyGroupTreesInFiles retrieves gene trees from files, one tree and family per file. There are four different tags allowed under this tag in order to fill this family group with families. The pair <include_directory>, <exclude_directory> are used to include or exclude files from a family group. The attribute tree_files_directory is used in these two tags to specify the directory to include or exclude. Last two tags are <include_group>, <exclude_group> which are used to include and exclude other groups. These two tags have following attributes.

When including or excluding directories there are two attributes in the family group tag which are used to choose which files to include or exclude. It can be seen as a filter and the attributes are

Here is an example on a few linked family groups.
     
  <family_group did="my_super" eid="trees_in_files"
                data_source="my_data_source"
                />
      
  <family_group did="all" eid="my_super">
    <include_directory tree_files_directory="trees/dir_a"/>
    <include_directory tree_files_directory="trees/dir_b"/>
  </family_group>
      
  <family_group did="not_yet_valid" eid="my_super">
    <include_group eid="super" family_group="all"
                   family_numbers="123, 456, 789"/>
  </family_group>
		       
  <family_group did="small_trees" eid="my_super">
    <include_group eid="super" family_group="all"
                   max_number_leaves="20"/>
  </family_group>

  <family_group did="the_rest" eid="my_super">
    <include_group eid="super" family_group="all"/>
    <exclude_group eid="super" family_group="not_yet_valid"/>
    <exclude_group eid="super" family_group="small_trees"/>
  </family_group>
    
The family group with did="my_super" contains common attributes. The family group with did="not_yet_valid" contains three families with number 123, 456, and 789. Creating a group with small families (did="small_trees") can be useful when you need to test different settings since running the task will not take long. The last family group, did="the_rest" makes sure that no family is overlooked. For more information see def.xml.