This method will build the tree that is defined as G_g in proof of lemma 4.6 with the difference of the way the trees G_{D_k} are built and that it will try to move duplications further down to reduce loss. 0. BuildGeneTree(P as list) 1. let G_d be an empty list. When adding gene nodes to this list, it will maintain the order such that for all gene nodes in G_d |SL(G_d[i])| < |SL(G_d[j])| iff i 1 do 5. let i = 1; 6. let found_pair = false 7. while i < size(G_d) and not found_pair do 8. let g be a new unconnected gene node 9. add G_d[0] and G_d[i] as children under g 10. if MoveDuplicationsDownSingle(g, G_d[0], G_d[i]) > 0 then 11. remove G_d[0] from G_d 12. remove G_d[i] from G_d 13. add g to G_d 14. found_pair = true 15. else 16. remove the child G_d[0] from under g 17. remove the child G_d[i] from under g 18. i = i + 1 19. if not found_pair then 20. let g be a new unconnected gene node 21. add G_d[0] and G_d[1] as children under g 22. remove G_d[0] from G_d 23. remove G_d[1] from G_d 24. add g to G_d 25. MoveDuplicationsDown(G_d[0]) 26. return G_d[0] This method will build the tree that is defined as G_{D_k} in the proof of lemma 4.6 with the difference that all uncertainties that is found in the species tree will not be removed. 0. BuildGeneTreeUsingSpeciesTree(G as list) 1. let s = m(G[0]) 2. for i = 1 to size(G) - 1 3. s = Lca(s, m(G[i])) 4. let species_to_gene be a map from species nodes to gene nodes 5. let g_r be a new unconnected gene node 6. map s to g_r in species_to_genes 7. for every g_i in G 8. let g be a new unconnected gene node 9. g.m = m(g_i) 10. g.M = M(g_i) 11. g.SL = SL(g_i) 12. set s = m(g_i) 13. if s exists in species_to_genes then 14. let g_p be what s maps to in species_to_genes 15. connect g as child under the parent g_p 16. else 17. while s does not exist in species_to_genes 18. let g_p be a new unconnected gene node 19. connect g as child under the parent g_p 20. map s to g_p in species_to_genes 21. g = g_p 22. s = parent(s) 23. let g_p be what s maps to in species_to_genes 24. connect g as child under the parent g_p 25. return RemoveSingleChildren(g_r) 0. RemoveSingleChildren(g as the root of a tree) 1. if g does not have children 2. return g 3. if size(children(g)) = 1 then 4. let g_c be the child of g 5. remove the child g_c from the parent g 6. if g has a parent then 7. let g_p = parent(g) 8. remove the child g from the parent g_p 9. add g_c as child to g_p 10. return RemoveSingleChildren(g_c) 11. for every child g_i in g 12. RemoveSingleChildren(g_i) 13. return g Following tree methods are used to move duplications down in the gene tree in order to reduce loss. 0. MoveDuplicationsDown(g as gene node) 1. if g does not have two children then 2. return 8. let g_1 be g's first child and g_2 be g's second child 9. let n = MoveDuplicationsDown(g_1) 10. n = n + MoveDuplicationsDown(g_2) 11. if |SL(g_1) /\ SL(g_2)| = 0 then 12. return n 13. return n + MoveDuplicationsDownSingle(g, g_1, g_2) 0. MoveDuplicationsDownSingle(g as gene node, g_1 as gene node, g_2 as gene node) 1. if g_1 has children then 2. for every child g_1i under g_1 3. int n = MoveDuplicationsDownTriplet(g, g_1, g_1i, g_2) 4. if n > 0 then 5. return n 6. if g_2 has children then 7. for every child g_2i under g_2 8. int n = MoveDuplicationsDownTriplet(g, g_2, g_2i, g_1) 9. if n > 0 then 10. return n 11. return 0 0. MoveDuplicationsDownTriplet(g as gene node, g_1 as gene node, g_1c as gene node, g_2 as gene node) 1. let S be an empty set 2. for every child g_1i under g_1 3. if g_1i != g_1c then 4. add all in M(g_1i) to S 5. add all in M(g_2) to S 6. let s_r = S[0] 7. let i = 1 8. while i < size(S) do 9. s_r = Lca(s_r, S[i]) 10. let M be an empty set 11. if S contains s_r then 12. if s_r has children then 13. add all in children(s_r) to M 14. else 15. add s_r to M 16. else 17. for every s_i in S 18. let s = s_i 19. while parent(s) != s_r do 20. s = parent(s) 21. add s to M 22. let Sl1 be an empty set 23. for every s_i in M 24. add all in the set leaves(s_i) to Sl1 where leaves(s_i) are the leaves of the rooted subtree where s_i is the root 25. if |Sl1 /\ SL(g_1c)| > 0 then 26. return 0 27. if g_1 has more then two children then 28. let g_11 be a new unconnected gene node 29. for every child g_1i under g_1 30. if g_1i != g_1c then 31. remove the child g_1i from g_1 32. add g_1i as child under g_11 33. add g_11 as child under g_1 34. remove the child g_1c from under g_1 35. remove the child g_2 from under g 36. add g_2 as child under g_1 37. add g_1c as child under g 38. unset g_1.m 39. unset g_1.M 40. unset g_1.sl 41. let g_11 and g_12 be the two children under g_1 42. return MoveDuplicationsDownSingle(g_1, g_11, g_12) + 1 This method will return the loss for the rooted subtree where g is the root. If there are any uncertainties in the tree this method will assume that they come from the species tree and therefore do not cause any loss. 0. ComputeLossBinary(g as gene node) 1. if g does not have children then 2. return 0 3. let loss = 0; 4. if g has more then two children then 5. for every child g_i under g 6. loss = loss + ComputeLossBinary(g_i) 7. else 8. let g_1 and g_2 be g's two children 9. let is_g_duplicated = |SL(g_1) /\ SL(g_2)| > 0 10. loss = loss + ComputeLossBinary(g, g_1, is_g_duplicated) 11. loss = loss + ComputeLossBinary(g, g_2, is_g_duplicated) 12. loss = loss + ComputeLossBinary(g_1) 13. loss = loss + ComputeLossBinary(g_2) 14. return loss 0. ComputeLossBinary(g as gene node, g_c as one of g's two children, is_g_duplicated as boolean) 1. let l_1 = 0 2. if m(g) != m(g_c) then 3. let s = parent(m(g_c)) 4. while s != m(g) do 5. l_1 = l_1 + 1 6. s = parent(s) 7. let l_2 = 0 8. if Below(m(g_c), m(g)) and not (size(children(m(g_c))) = 0 or M(g_c) contains all children(m(g_c))) then 9. l_2 = 1 10. let l_3 = 1 11. if not is_g_duplicated or (is_g_duplicated and M(g) = M(g_c)) then 12. l_3 = 0 13. return l_1 + l_2 + l_3