Customizing the java classes for the NCBI generated by XJC
Reminder: XJC is the Java XML Binding Compiler. It automates the mapping between XML documents and Java objects:
The code generated by XJC allows to :
- Unmarshal XML content into a Java representation
- Access and update the Java representation
- Marshal the Java representation of the XML content into XML content
For example, the following XML-Schema (tinyseq.xsd) describes a TinySeq-XML document returned by the NCBI.
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:documentation> XML schema for NCBI tinyseq format</xs:documentation> </xs:annotation> <xs:complexType name="TSeqSet_t"> <xs:annotation> <xs:documentation>Set of sequences</xs:documentation> </xs:annotation> <xs:sequence> <xs:element ref="TSeq" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="TSeq_t"> <xs:annotation> <xs:documentation>A Tiny Sequence</xs:documentation> </xs:annotation> <xs:sequence> <xs:element name="TSeq_seqtype"> <xs:complexType> <xs:attribute name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="nucleotide"/> <xs:enumeration value="protein"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> <xs:element name="TSeq_gi" type="xs:long"/> <xs:element name="TSeq_accver" type="xs:string"/> <xs:element name="TSeq_sid" type="xs:string"/> <xs:element name="TSeq_taxid" type="xs:long"/> <xs:element name="TSeq_orgname" type="xs:string"/> <xs:element name="TSeq_defline" type="xs:string"/> <xs:element name="TSeq_length" type="xs:nonNegativeInteger"/> <xs:element name="TSeq_sequence" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="TSeqSet" type="TSeqSet_t"/> <xs:element name="TSeq" type="TSeq_t"/> </xs:schema>
This xml-schema can be compiled with XJC:
${JAVA_HOME}/bin/xjc -d . -p generated tinyseq.xsd
parsing a schema...
compiling a schema...
generated/ObjectFactory.java
generated/TSeqSetT.java
generated/TSeqT.java
$ more generated/TSeqT.java package generated; (...) @XmlAccessorType(XmlAccessType.FIELD) @XmlType(name = "TSeq_t", propOrder = { "tSeqSeqtype", "tSeqGi", "tSeqAccver","tSeqSid",(...),"tSeqSequence"}) public class TSeqT { @XmlElement(name = "TSeq_seqtype", required = true) protected TSeqT.TSeqSeqtype tSeqSeqtype; @XmlElement(name = "TSeq_gi") protected long tSeqGi; @XmlElement(name = "TSeq_accver", required = true) protected String tSeqAccver; @XmlElement(name = "TSeq_sid", required = true) protected String tSeqSid; @XmlElement(name = "TSeq_taxid") protected long tSeqTaxid; @XmlElement(name = "TSeq_orgname", required = true) protected String tSeqOrgname; @XmlElement(name = "TSeq_defline", required = true) protected String tSeqDefline; @XmlElement(name = "TSeq_length", required = true) @XmlSchemaType(name = "nonNegativeInteger") protected BigInteger tSeqLength; @XmlElement(name = "TSeq_sequence", required = true) protected String tSeqSequence; (...) }
But XJC doesn't know how to generate some classical java functions like 'hashCode', 'equals' or 'toString' or to add some custom methods to your classes.
Hopefully the standard distribution of XJC comes with a plugin named -Xinject-code whch injects some custom code in the classes generated by XJC.
For example, if we want to add a toString method to the class TSeqT, we're going to write the following "java xml binding file" (jxb) which alters the initial xml schema:
<?xml version="1.0" encoding="UTF-8"?> <jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" xmlns:ci="http://jaxb.dev.java.net/plugin/code-injector" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" jxb:extensionBindingPrefixes="ci " jxb:version="2.1" > <jxb:bindings schemaLocation="tinyseq.xsd"> <!-- here we use an XPATH expression to tell xjc about which part of the XML schema we want to change --> <jxb:bindings node="/xs:schema/xs:complexType[@name='TSeq_t']"> <ci:code> /** toString : returns the gi and the defline */ public String toString() { return "gi:"+getTSeqGi()+"|"+getTSeqDefline(); } </ci:code> </jxb:bindings> </jxb:bindings> </jxb:bindings>
Below, I wrote a larger JXB file 'tinyseq.jxb' which injects the following methods:
- 'equals' method for TSeq
- 'hashCode' method for TSeq
- 'toString' method for TSeq
- 'printAsFasta' method for TSeq
- 'getTSeqSetbyId' method for TSeqSet. A static function fetching a TinySeq sequence from the NCBI for a given 'gi'
- a 'main' method for TSeqSet. It loops over a list of 'gi's, fetches the sequences (using NCBI-EFetch) and prints the sequences as FASTA
<?xml version="1.0" encoding="UTF-8"?> <jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" xmlns:ci="http://jaxb.dev.java.net/plugin/code-injector" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" jxb:extensionBindingPrefixes="ci " jxb:version="2.1" > <jxb:bindings schemaLocation="tinyseq.xsd"> <jxb:bindings node="/xs:schema/xs:complexType[@name='TSeq_t']"> <ci:code> /* print this sequence as fasta */ public void printAsFasta(java.io.PrintStream out) { String s=getTSeqSequence(); out.print(">gi:"+getTSeqGi()+"|"+ getTSeqAccver() +"|"+getTSeqDefline()); for(int i=0;i < s.length();++i) { if(i%60==0) out.println(); out.print(s.charAt(i)); } out.println(); } /** equals: two TSeq are equal if they have the same gi */ @Override public boolean equals(Object o) { if(o==this) return true; if(o==null || o.getClass()!=this.getClass()) return false; return this.getTSeqGi()==TSeqT.class.cast(o).getTSeqGi(); } /** hashCode : use gi */ @Override public int hashCode() { return (int)(this.getTSeqGi()^(this.getTSeqGi()>>>32)); } /** toString : returns the gi and the defline */ public String toString() { return "gi:"+getTSeqGi()+"|"+getTSeqDefline(); } </ci:code> </jxb:bindings> <jxb:bindings node="/xs:schema/xs:complexType[@name='TSeqSet_t']"> <ci:code> /** get TSeqSetT from a given gi */ public static TSeqSetT getTSeqSetbyId(long gi) throws javax.xml.bind.JAXBException, javax.xml.bind.UnmarshalException , java.io.IOException { /** find the JAXB context in the defined path */ javax.xml.bind.JAXBContext jc = javax.xml.bind.JAXBContext.newInstance(TSeqSetT.class,TSeqT.class); javax.xml.bind.Unmarshaller u = jc.createUnmarshaller(); String uri="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=fasta&retmode=xml&id="+gi; /** read the sequence */ return u.unmarshal(new javax.xml.transform.stream.StreamSource(uri),TSeqSetT.class).getValue(); } /** main: takes a list of gi and prints the sequences as fasta */ public static void main(String args[]) throws Exception { for(int optind=0;optind < args.length;++optind) { TSeqSetT tss=TSeqSetT.getTSeqSetbyId(Long.parseLong(args[optind])); for(generated.TSeqT seq:tss.getTSeq()) seq.printAsFasta(System.out); } } </ci:code> </jxb:bindings> </jxb:bindings> </jxb:bindings>
Compile the schema, compile the java classes and execute
$ xjc -target 2.1 -verbose -Xinject-code -extension -d . -p generated -b tinyseq.jxb tinyseq.xsd parsing a schema... compiling a schema... [INFO] generating code unknown location generated/ObjectFactory.java generated/TSeqSetT.java generated/TSeqT.java $ javac generated/*.java $ java generated.TSeqSetT 25 26 27 >gi:25|X53813.1|Blue Whale heavy satellite DNA TAGTTATTCAACCTATCCCACTCTCTAGATACCCCTTAGCACGTAAAGGAATATTATTTG GGGGTCCAGCCATGGAGAATAGTTTAGACACTAGGATGAGATAAGGAACACACCCATTCT AAAGAAATCACATTAGGATTCTCTTTTTAAGCTGTTCCTTAAAACACTAGAGTCTTAGAA ATCTATTGGAGGCAGAAGCAGTCAAGGGTAGCCTAGGGTTAGGGTTAGGCTTAGGGTTAG GGTTAGGGTACGGCTTAGGGTACTGTTTCGGGGAGGGGTTCAGGTACGGCGTAGGGTATG GGTTAGGGTTAGGGTTAGGGTTAGTGTTAGGGTTAGGGCTCGGTTTAGGGTACGGGTTAG GATTAGGGTACGTGTTAGGGTTAGGGTAGGGCTTAGGGTTAGGGTACGTGTTAGGGTTAG GG >gi:26|X53814.1|Blue Whale heavy satellite DNA TAGTTATTAAACCTATCCCACTCTCTAGATACACCTTAGCACGTAAAGGAATATTATTTG GGGGTCCAGACATGGAGAAGAGTTTAGACACTAGGATAAGATAAGGAACACACCCATTCT AAAGAAATCACATTAGGATTCTCTTTTTAAGCTGTTCCTTAAAACTCTAGTGCTTAGGAA ATCTATTGGAGGCAGAAGCAGTCAAGGGTAGCCTAGGGTTAGGGTTAGGCTTATGGTTAG GGCTAGGGTACGGCTTAGGGTACGGATTCGGGGAGGGGTTCGGGTACGGCGTAGGGTATG GGTTAGGGTTAGCGTTAGTGTTAGGGTTAGGGCTCGGTTTAGGGTACGGGTTAGGATTAG GGTACGTGTTAGGGTTAGGGTAGGGGTTAGGGTTAGGGTACGCGTTAGGGTTAGGG >gi:27|Z18633.1|B.physalus gene for large subunit rRNA AACCAGTATTAGAGCACTGCCTGCCCGGTGACTAATCGTTAAACGGCCGCGGTATCCTGA CCGTGCAAAGGTAGCATAATCACTTGTTCTCTAATTAGGGACTTGTATGAATGGCCACAC GAGGGTTTTACTGTCTCTTACTTTTAATCAGTGAAATTGACCTCTCCGTGAAGAGGCGGA GATAACAAAATAAGACGAGAAGACCCTATGGAGCTTCAATTAATCAACCCAAAAACCATA ACCTTAAACCACCAAGGGATAACAAAACCTTATATGGGCTGACAATTTCGGTTGGGGTGA CCTCGGAGTACAAAAAACCCTCCGAGTGATTAAAACTTAGGCCCACTAGCCAAAGTACAA TATCACTTATTGATCCAATCCTTTGATCAACGGAACAAGTTACCCTAGGGATAACAGCGC AATCCTATTCTAGAGTCCATATCGACAATAGGGTTTACGACCTCGATGTTGGATCAGGAC ATCCTAATGGTGCAGCTGCTATTAAGGGTTCGTTTGTTThat's it,
Pierre
No comments:
Post a Comment