A XML schema (xsd) for GeneOntology
The GeneOntology can be downloaded as a RDF/XML file from http://archive.geneontology.org/latest-termdb/go_daily-termdb.rdf-xml.gz.
Although it is a RDF file, the structure of the file remains the same. As a consequence, it is shipped with a DTD that describes the structure of the document ( http://www.geneontology.org/dtd/go.dtd ).
I've just written a XML schema (XSD) for this RDF file. This schema is available on github at:
https://github.com/lindenb/xsd-sandbox/tree/master/schemas/bio/go.
Validation with xmllint
The RDF file is successfully validated against my xsd schema:$ curl "http://archive.geneontology.org/latest-termdb/go_daily-termdb.rdf-xml.gz" |\ gunzip -c | grep -v "<!DOCTYPE " > go.xml xmllint --noout --schema go.xsd go.xml go.xml validatesNote: I've ignored the elements defined in the DTD but absent in the RDF file.
Code Generation with XJC
XJC can be used to generate the java classes for this schema:xjc go.xsd parsing a schema... compiling a schema... org/w3/_1999/_02/_22_rdf_syntax_ns_/ObjectFactory.java org/w3/_1999/_02/_22_rdf_syntax_ns_/RDF.java org/w3/_1999/_02/_22_rdf_syntax_ns_/package-info.java org/geneontology/dtds/go/AbstractRelation.java org/geneontology/dtds/go/Go.java org/geneontology/dtds/go/IsA.java org/geneontology/dtds/go/NegativelyRegulates.java org/geneontology/dtds/go/ObjectFactory.java org/geneontology/dtds/go/PartOf.java org/geneontology/dtds/go/PositivelyRegulates.java org/geneontology/dtds/go/Regulates.java org/geneontology/dtds/go/package-info.java
Java Parsing
... and we can parse the terms of GO with java without writing a new parser and without any dependencies. For example, the following code parses the whole ontology and prints it to stdout as XML:import java.io.InputStream; import java.io.StringWriter; import org.geneontology.dtds.go.*; import org.w3._1999._02._22_rdf_syntax_ns_.*; import javax.xml.namespace.QName; import javax.xml.bind.JAXBContext; import javax.xml.bind.JAXBElement; import javax.xml.bind.Unmarshaller; import javax.xml.bind.Marshaller; import javax.xml.transform.stream.StreamSource; public class TestGo { public static void main(String[] args) throws Exception { JAXBContext jaxbCtxt=JAXBContext.newInstance("org.geneontology.dtds.go:org.w3._1999._02._22_rdf_syntax_ns_"); Marshaller marshaller = jaxbCtxt.createMarshaller(); Unmarshaller unmarshaller=jaxbCtxt.createUnmarshaller(); marshaller.setProperty("jaxb.formatted.output",true); Object go=unmarshaller.unmarshal(new java.io.File("go.xml")); marshaller.marshal(go, System.out); } }compile and run:
$javac TestGo.java \ org/w3/_1999/_02/_22_rdf_syntax_ns_/ObjectFactory.java \ org/geneontology/dtds/go/ObjectFactory.java $ java TestGo | head -n 100 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <go xmlns="http://www.geneontology.org/dtds/go.dtd#" xmlns:ns2="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <ns2:RDF> <term ns2:about="http://www.geneontology.org/go#GO:0000001"> <accession>GO:0000001</accession> <name>mitochondrion inheritance</name> <synonym>mitochondrial inheritance</synonym> <definition>The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.</definition> <is_a ns2:resource="http://www.geneontology.org/go#GO:0048308"/> <is_a ns2:resource="http://www.geneontology.org/go#GO:0048311"/> </term> <term ns2:about="http://www.geneontology.org/go#GO:0000002"> <accession>GO:0000002</accession> <name>mitochondrial genome maintenance</name> <definition>The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.</definition> <is_a ns2:resource="http://www.geneontology.org/go#GO:0007005"/> <dbxref ns2:parseType="Resource"> <database_symbol>InterPro</database_symbol> <reference>IPR009446</reference>
That's it,
Pierre
PS: many thanks to @bdoughan for his help on SO.
No comments:
Post a Comment