The poor state of the java web services for Bioinformatics

I wanted to test the ${JAVA_HOME}/bin/wimport for all the services in the biocatalogue: I created a small java program using the biocatalogue API (see below) and extracting the web services having a WSDL file. Each WSDL URI was processed with the ${JAVA_HOME}/bin/wimport and I observed if any class was generated. The wsimport '-version' was
JAX-WS RI 2.1.6 in JDK 6
.The result is available as a Google spreadsheet at :
Result
Number of services: 1644
Can't access the service, something went wrong:6
No WSDL: 6
Found a WSDL: 1590
Number of services where wsimport failed to parse the WSDL: 1179 (74%)
Common Errors:
690 : [ERROR] rpc/encoded wsdls are not supported in JAXWS 2.0.
119 : [ERROR] undefined simple or complex type 'soapenc:Array'
96 : [ERROR] 'EndpointReference' is already defined
7 : [ERROR] only one "types" element allowed in "definitions"
6 : [ERROR] undefined simple or complex type 'apachesoap:DataHandler'
4 : [ERROR] only one of the "element" or "type" attributes is allowed in part "inDoc"
Number of services successfully parsed by wsimport: 411 (26%)
Count by host:
- 306 : http://www.ebi.ac.uk
- 10 : http://www.cbs.dtu.dk
- 9 : http://prodom.prabi.fr
- 7 : http://api.bioinfo.no
- 6 : http://www.ncbi.nlm.nih.gov
- 5 : http://funcnet.eu
- 5 : http://wsdl.sbc.su.se
- 4 : http://gnode1.mib.man.ac.uk:8080
- 4 : http://www.genesilico.pl
- 3 : http://mrs.cmbi.ru.nl
- 2 : http://134.169.104.13
- 2 : http://conscore.embl.de
- 2 : http://gmd.mpimp-golm.mpg.de
- 2 : http://myhits.isb-sib.ch
- 2 : http://sabio.bioquant.uni-heidelberg.de
- 2 : http://sabio.villa-bosch.de
- 2 : http://spank.ba.itb.cnr.it
- 2 : http://trunk.cathdb.info
- 2 : http://www.amdcc.org
- 2 : http://www.ibi.vu.nl
- 2 : http://www.webservicex.com
- 1 : http://bioinformatics.ua.pt
- 1 : http://bioit.fleming.gr
- 1 : http://biotin.uio.no:8080
- 1 : http://chipster.csc.fi
- 1 : http://discover.nci.nih.gov
- 1 : http://genomematrix.molgen.mpg.de
- 1 : http://globplot.embl.de
- 1 : http://gopubmed4.biotec.tu-dresden.de
- 1 : http://inb.bsc.es
- 1 : http://iomics.ugent.be
- 1 : http://matrixdb.ibcp.fr:8080
- 1 : http://mint.bio.uniroma2.it
- 1 : http://mouse.brain-map.org
- 1 : http://myhits.vital-it.ch
- 1 : http://phospho.elm.eu.org
- 1 : http://pubchem.ncbi.nlm.nih.gov
- 1 : http://quebec.chebi.bio2rdf.org
- 1 : http://smart.embl.de
- 1 : http://smart.embl-heidelberg.de
- 1 : http://structurefilter.embl.de
- 1 : http://ubio.bioinfo.cnio.es
- 1 : http://utopia.cs.man.ac.uk
- 1 : http://wiws.cmbi.ru.nl
- 1 : http://wsembnet.vital-it.ch
- 1 : http://www.biomart.org
- 1 : http://www.chemspider.com
- 1 : http://www.comp-sys-bio.org
- 1 : http://www.jcvi.org
- 1 : http://www.migenas.mpg.de
- 1 : http://www.wikipathways.org
Source Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* test if the web services listed in http://www.biocatalogue.org/ are | |
* valid for wsimport | |
* Author : Pierre Lindenbaum PhD | |
* http://plindenbaum.blogspot.com | |
*/ | |
import java.io.File; | |
import java.io.IOException; | |
import java.io.InputStream; | |
import java.io.PrintWriter; | |
import java.util.Arrays; | |
import javax.xml.parsers.DocumentBuilder; | |
import javax.xml.parsers.DocumentBuilderFactory; | |
import javax.xml.xpath.XPath; | |
import javax.xml.xpath.XPathConstants; | |
import javax.xml.xpath.XPathExpression; | |
import javax.xml.xpath.XPathFactory; | |
//see http://code.google.com/p/lindenb/source/browse/trunk/src/java/org/lindenb/xml/NamespaceContextImpl.java | |
import org.lindenb.xml.NamespaceContextImpl; | |
import org.w3c.dom.Attr; | |
import org.w3c.dom.Document; | |
import org.w3c.dom.NodeList; | |
public class StateOfBiocatalogue | |
{ | |
private File generated; | |
private DocumentBuilder builder; | |
private XPathExpression servicesExpr; | |
private XPathExpression titleExpr; | |
private XPathExpression wsdlExpr; | |
StateOfBiocatalogue() throws Exception | |
{ | |
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance(); | |
factory.setCoalescing(true); | |
factory.setExpandEntityReferences(true); | |
factory.setValidating(false); | |
factory.setIgnoringComments(true); | |
factory.setNamespaceAware(true); | |
factory.setIgnoringElementContentWhitespace(true); | |
this.builder=factory.newDocumentBuilder(); | |
XPathFactory xpathFactory=XPathFactory.newInstance(); | |
XPath xpath=xpathFactory.newXPath(); | |
NamespaceContextImpl nsContext=new NamespaceContextImpl(); | |
nsContext.setPrefixURI("rest", "http://www.biocatalogue.org/2009/xml/rest"); | |
nsContext.setPrefixURI("xlink", "http://www.w3.org/1999/xlink"); | |
nsContext.setPrefixURI("dc", "http://purl.org/dc/elements/1.1/"); | |
xpath.setNamespaceContext(nsContext); | |
this.servicesExpr= xpath.compile("/rest:services/rest:results/rest:service/@xlink:href"); | |
this.titleExpr= xpath.compile("/rest:service/dc:title"); | |
this.wsdlExpr= xpath.compile("/rest:service/rest:variants/rest:soapService/rest:wsdlLocation"); | |
} | |
private int cleanup(File generated) | |
{ | |
int count=0; | |
if(!generated.exists()) return 0; | |
if(generated.isDirectory()) | |
{ | |
for(File f2: generated.listFiles()) | |
{ | |
count+=cleanup(f2); | |
} | |
generated.delete(); | |
} | |
else if(generated.isFile()) | |
{ | |
if(generated.getName().endsWith(".class")) ++count; | |
generated.delete(); | |
} | |
return count; | |
} | |
private Document parseXML(String url) throws Exception | |
{ | |
System.err.println(url); | |
int n_try=0; | |
Document dom= null; | |
while(n_try<10) | |
{ | |
try | |
{ | |
dom=this.builder.parse(url); | |
break; | |
} | |
catch(Exception err) | |
{ | |
Thread.sleep(30000);//wait 30 secs | |
System.err.println("Trying again.... "+url); | |
n_try++; | |
continue; | |
} | |
} | |
if(dom==null) throw new IOException("Cannot read "+url); | |
return dom; | |
} | |
private void lookup(PrintWriter out,String url) throws Exception | |
{ | |
File generated=new File(System.getProperty("java.io.tmpdir","/tmp/"),"generated"); | |
String javaHome= System.getProperty("java.home"); | |
Document dom=null; | |
try | |
{ | |
dom=parseXML(url+".xml"); | |
} | |
catch(Exception err) | |
{ | |
out.print(url); | |
out.print("\t"); | |
out.print("[error]"); | |
out.print("\tNO:CONNECTION_FAILED"); | |
out.println(); | |
return; | |
} | |
NodeList wsdls=(NodeList)this.wsdlExpr.evaluate(dom, XPathConstants.NODESET); | |
if(wsdls.getLength()==0) | |
{ | |
out.print(url); | |
out.print("\t"); | |
out.print(this.titleExpr.evaluate(dom,XPathConstants.STRING)); | |
out.print("\tNO-WSDL"); | |
out.println(); | |
return; | |
} | |
for(int i=0;i< wsdls.getLength();++i) | |
{ | |
out.print(url); | |
out.print("\t"); | |
out.print(this.titleExpr.evaluate(dom,XPathConstants.STRING)); | |
out.print("\tYES"); | |
out.print("\t"); | |
String wsdlURI=wsdls.item(i).getTextContent(); | |
out.print(wsdlURI); | |
generated.mkdir(); | |
String cmdarray[]={javaHome+"/bin/wsimport", | |
//"-httpproxy:host:port", | |
"-d",generated.toString(),wsdlURI | |
}; | |
StringBuilder message=new StringBuilder(); | |
Process proc=Runtime.getRuntime().exec(cmdarray); | |
System.err.println("run : "+Arrays.toString(cmdarray)); | |
InputStream in=proc.getInputStream(); | |
int c; | |
while(in!=null && (c=in.read())!=-1) | |
{ | |
message.append((char)c); | |
} | |
int rez=proc.waitFor(); | |
if(in!=null) in.close(); | |
int nClass=cleanup(generated); | |
if(rez==0 && nClass==0) rez=-1; | |
out.print("\t"+(rez==0?"SUCCESS":"FAILURE")); | |
if(rez!=0) | |
{ | |
out.print("\t"); | |
for(String s:message.toString().replace("\r", "\n").replace('\t', ' ').split("[\n]")) | |
{ | |
s=s.trim(); | |
if(!s.startsWith("[ERROR]")) continue; | |
out.print(s); | |
break; | |
} | |
} | |
else | |
{ | |
out.print("\t"+(nClass)+" classes"); | |
} | |
out.println(); | |
} | |
out.flush(); | |
} | |
private void run(PrintWriter out) throws Exception | |
{ | |
int service_page=0; | |
while(true) | |
{ | |
service_page++; | |
Document dom=parseXML("http://www.biocatalogue.org/services.xml?page="+service_page); | |
NodeList serviceList=(NodeList)this.servicesExpr.evaluate(dom, XPathConstants.NODESET); | |
for(int i=0;i< serviceList.getLength();++i) | |
{ | |
lookup(out,Attr.class.cast(serviceList.item(i)).getValue()); | |
} | |
if(serviceList.getLength()==0) break; | |
} | |
System.err.println("\nDone\n"); | |
} | |
public static void main(String[] args) | |
{ | |
try { | |
PrintWriter out=new PrintWriter(new File("biocatalog.xls")); | |
new StateOfBiocatalogue().run(out); | |
out.flush(); | |
out.close(); | |
} | |
catch (Exception e) | |
{ | |
e.printStackTrace(); | |
} | |
} | |
} |
That's it
Pierre
6 comments:
Nice!
Oh wow, you would hope that kind of testing is done automatically and used for flagging services ! Did you already ask the BioCatalogue guys if they can perform this test automatically and on a regular basis?
@Joerg, nice suggestion, I'll suggest it to the BioCatalogue
Hello from a BioCatalogue person,
Interesting findings! And nice use of the BioCatalogue API :-) Very pleased that you provided the source code.
@Joerg, intriguing suggestion! For service monitoring we currently have regular availability checks and run pre-approved test scripts that test the functionality of individual services. Testing for certain toolkits/platforms (in this case, Java/wimport) could have major benefits to users and providers who want to know where the services work or not.
We are planning on allowing service providers etc to host/run tests like these and then publish the test results back to us via an API. This can allow us to aggregate all kinds of monitoring information. (We would mark these as "external tests" and give appropriate credit).
What do you think?
Alternatively, we could possibly wrap tests like these into test scripts and run them on our existing infrastructure.
One question I had: we are wary of ever flagging services as being "dysfunctional", so in the case of Java/wsimport how mature is that stack for web services client work?
Cheers,
Jits
@jits : "How mature is that stack for web services client work ". I'm not sure how I can answer this question; wsimport is a 'standard' tool released in the Java SDK. As far as I can see, the EBI generates its web services using wsgen and then, on client side, wsimport can read those WSDL without any problem. Java people are also using some other tools such as apache AXIS, or CXF. I know those tools also fail with parsing the 'old' WSDLs (see http://stackoverflow.com/questions/2479069/importing-a-webservice.
@pierre: thanks for the clarification. I guess the world of SOAP/WSDL parsing is fraught with issues!
Post a Comment