Describing protein-protein interactions in XML: customizing the xsd-schema with JAXB, my notebook.
Say, you want to describe a network of protein-protein interaction using a XML format. Your XML schema will contain a set of
- Articles/References
- Proteins
- Interactions
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.example.org/" targetNamespace="http://www.example.org/" elementFormDefault="qualified"> <complexType name="article"> <sequence> <element name="title" type="string"/> <element name="year" type="gYear"/> </sequence> <attribute name="pmid" type="ID" use="required"/> </complexType> <complexType name="protein"> <sequence> <element name="acn" type="ID"/> <element name="description" type="string"/> </sequence> </complexType> <complexType name="interaction"> <sequence> <element name="pmids" type="int" minOccurs="1" maxOccurs="unbounded"/> <element name="proteins" type="IDREF" minOccurs="1" maxOccurs="unbounded"/> </sequence> </complexType> <complexType name="interactome"> <sequence> <element name="article" type="tns:article" minOccurs="0" maxOccurs="unbounded"/> <element name="protein" type="tns:protein" minOccurs="0" maxOccurs="unbounded"/> <element name="interaction" type="tns:interaction" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> <element name="interactome" type="tns:interactome"/> </schema>Here, the attibutes 'type="ID"' and 'type="IDREF"' are used to link the entities (One protein can be part of several interactions,....).
One can generate the java classes for those types using: ${JAVA_HOME}/bin/xjc:
$ xjc interactome.xsd parsing a schema... compiling a schema... org/example/Article.java org/example/Interaction.java org/example/Interactome.java org/example/ObjectFactory.java org/example/Protein.java org/example/package-info.javaProblem: xjc doesn't know the exact nature of the links created between ID and IDREF. What kind of object should return the method 'getProteins' of the class 'Interaction' ? In consequence, xjc generates the following code:
$ more org/example/Interaction.java (...) protected List<JAXBElement<Object>> proteins; (...) public List<JAXBElement<Object>> getProteins()
We can tell xjc about those link by creating a binding file (JXB). In the following file, we tell XJC that the entities linked by 'proteins' should be some instances of 'Protein':
<?xml version="1.0" encoding="UTF-8"?> <jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" jxb:version="2.1"> <jxb:bindings schemaLocation="interactome.xsd"> <jxb:bindings node="/xs:schema/xs:complexType[@name=' interaction']/xs:sequence"> <jxb:bindings node="xs:element[@name=' proteins']"> <jxb:property> <jxb:baseType name="Protein"/> </jxb:property> </jxb:bindings> </jxb:bindings> </jxb:bindings> </jxb:bindings>
Invoking XJC with the bindings:
$ xjc -b interactome.jxb interactome.xsd parsing a schema... compiling a schema... org/example/Article.java org/example/Interaction.java org/example/Interactome.java org/example/ObjectFactory.java org/example/Protein.java org/example/package-info.java
The generated class 'Interaction.java' now contains the correct java type:
$ more org/example/Interaction.java (...) protected List<Protein> proteins; (...) public List<Protein> getProteins() { (....)
That's it,
Pierre
1 comment:
Hi Pierre,
Very nice article.
-Blaise
Post a Comment