18 September 2012

Describing protein-protein interactions in XML: customizing the xsd-schema with JAXB, my notebook.

Say, you want to describe a network of protein-protein interaction using a XML format. Your XML schema will contain a set of

  • Articles/References
  • Proteins
  • Interactions
Here is a simple XSD schema for this model:
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.example.org/" targetNamespace="http://www.example.org/" elementFormDefault="qualified">
  <complexType name="article">
    <sequence>
      <element name="title" type="string"/>
      <element name="year" type="gYear"/>
    </sequence>
    <attribute name="pmid" type="ID" use="required"/>
  </complexType>
  <complexType name="protein">
    <sequence>
      <element name="acn" type="ID"/>
      <element name="description" type="string"/>
    </sequence>
  </complexType>
  <complexType name="interaction">
    <sequence>
      <element name="pmids" type="int" minOccurs="1" maxOccurs="unbounded"/>
      <element name="proteins" type="IDREF" minOccurs="1" maxOccurs="unbounded"/>
    </sequence>
  </complexType>
  <complexType name="interactome">
    <sequence>
      <element name="article" type="tns:article" minOccurs="0" maxOccurs="unbounded"/>
      <element name="protein" type="tns:protein" minOccurs="0" maxOccurs="unbounded"/>
      <element name="interaction" type="tns:interaction" minOccurs="0" maxOccurs="unbounded"/>
    </sequence>
  </complexType>
  <element name="interactome" type="tns:interactome"/>
</schema>
Here, the attibutes 'type="ID"' and 'type="IDREF"' are used to link the entities (One protein can be part of several interactions,....).
One can generate the java classes for those types using: ${JAVA_HOME}/bin/xjc:
$ xjc  interactome.xsd
parsing a schema...
compiling a schema...
org/example/Article.java
org/example/Interaction.java
org/example/Interactome.java
org/example/ObjectFactory.java
org/example/Protein.java
org/example/package-info.java
Problem: xjc doesn't know the exact nature of the links created between ID and IDREF. What kind of object should return the method 'getProteins' of the class 'Interaction' ? In consequence, xjc generates the following code:

$ more org/example/Interaction.java

    (...)
    protected List<JAXBElement<Object>> proteins;
    (...)
    public List<JAXBElement<Object>> getProteins()

We can tell xjc about those link by creating a binding file (JXB). In the following file, we tell XJC that the entities linked by 'proteins' should be some instances of 'Protein':
<?xml version="1.0" encoding="UTF-8"?>
<jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" jxb:version="2.1">
  <jxb:bindings schemaLocation="interactome.xsd">
    <jxb:bindings node="/xs:schema/xs:complexType[@name=' interaction']/xs:sequence">
      <jxb:bindings node="xs:element[@name=' proteins']">
        <jxb:property>
          <jxb:baseType name="Protein"/>
        </jxb:property>
      </jxb:bindings>
    </jxb:bindings>
  </jxb:bindings>
</jxb:bindings>

Invoking XJC with the bindings:
$ xjc -b interactome.jxb  interactome.xsd
parsing a schema...
compiling a schema...
org/example/Article.java
org/example/Interaction.java
org/example/Interactome.java
org/example/ObjectFactory.java
org/example/Protein.java
org/example/package-info.java

The generated class 'Interaction.java' now contains the correct java type:


$ more org/example/Interaction.java

   (...)
    protected List<Protein> proteins;
    (...)
    public List<Protein> getProteins() {
     (....)

That's it,
Pierre

1 comment:

Blaise Doughan said...

Hi Pierre,

Very nice article.

-Blaise