Softparsmap - Manual | ||
---|---|---|
<<< Previous | Next >>> |
The DataSource interface is responsible of providing the package with data regarding species trees and gene sequences. In order to use your own data you can extend the abstract class AbstractDataSource and the abstract tag found in the def.xml file. For more information see section Create a Tag Instance.
The class
DataSourceXmlNcbiTaxonomy
is using the NCBI Taxonomy
database to extract a species tree for the gene family.
The tag is <data_source did="xml_ncbi_taxonomy" ...>
and contains following attributes.
sequence_data
- defines which XML sequence data
tag instance that will be used. See section
XML Sequence Data below
ncbi_taxonomy_names_file
- points to the species
name file in NCBI Taxonomy called names.dmp
ncbi_taxonomy_nodes_file
- points to the species
nodes file in NCBI Taxonomy called nodes.dmp
xml_database_file
- points to your XML database file
index_file
- points to the index file.
An index file has to be created for the genes found in the XML database.
This is done once and by typing
java softparsmap.DataSourceXmlNcbiTaxonomy [property file] [data source did] |
The
SequenceDataXml
is the class handling the XML detail in the sequence data file.
The tag is called <sequence_data did="xml" ...>
and contains following attributes.
file_reading_buffert_size
- determines how many
bytes to read from the file in one cycle.
main_tag_name
- is the name of the main tag. Under
this tag should all data regarding this sequence be placed.
sequence_id_tag_name
- is the name of the tag
containing the id number for this sequence. These numbers are
mapped to the numbers found in the leaves in the gene trees.
sequence_tag_name
- is the name of the tag
containing the sequence.
organism_name_tag_name
- is the name of the tag
containing the name of the organism that harbor this sequence. This
name is used to map this sequence into the NCBI Taxonomy database.
gi_number_tag_name
- is the name of the tag
containing the GI number. See NCBI for more information.
partial_complete_tag_name
- is the tag containing
the value defined in attribute complete_name
if
the sequence is complete, else it is partial.
complete_name
- see attribute
partial_complete_tag_name
above.
In order to implement your own sequence parser you can extend the abstract class AbstractSequenceData . For more information see section Create a Tag Instance and the API.
<<< Previous | Home | Next >>> |
In-paralogous | Families and Family Groups |