17 May 2010

The poor state of the java web services for Bioinformatics

In his latest post Brad Chapman cited Jessica Kissinger who wished the Galaxy community could access the web services listed in the http://www.biocatalogue.org/. This reminded me this thread I started on http://biostar.stackexchange.com/ : "Anyone using 'Biomart + Java Web Services' ?" where Michael Dondrup and I realized that there was a poor support of the JAVA Web services API for Biomart.

I wanted to test the ${JAVA_HOME}/bin/wimport for all the services in the biocatalogue: I created a small java program using the biocatalogue API (see below) and extracting the web services having a WSDL file. Each WSDL URI was processed with the ${JAVA_HOME}/bin/wimport and I observed if any class was generated. The wsimport '-version' was JAX-WS RI 2.1.6 in JDK 6.

The result is available as a Google spreadsheet at :

Result


Number of services: 1644
  Can't access the service, something went wrong:6
  No WSDL: 6
  Found a WSDL: 1590


Number of services where wsimport failed to parse the WSDL: 1179 (74%)

Common Errors:
  690 : [ERROR] rpc/encoded wsdls are not supported in JAXWS 2.0.
  119 : [ERROR] undefined simple or complex type 'soapenc:Array'
  96 : [ERROR] 'EndpointReference' is already defined
  7 : [ERROR] only one "types" element allowed in "definitions"
  6 : [ERROR] undefined simple or complex type 'apachesoap:DataHandler'
  4 : [ERROR] only one of the "element" or "type" attributes is allowed in part "inDoc"


Number of services successfully parsed by wsimport: 411 (26%)

Count by host:


Source Code


/**
* test if the web services listed in http://www.biocatalogue.org/ are
* valid for wsimport
* Author : Pierre Lindenbaum PhD
* http://plindenbaum.blogspot.com
*/
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.util.Arrays;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
//see http://code.google.com/p/lindenb/source/browse/trunk/src/java/org/lindenb/xml/NamespaceContextImpl.java
import org.lindenb.xml.NamespaceContextImpl;
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class StateOfBiocatalogue
{
private File generated;
private DocumentBuilder builder;
private XPathExpression servicesExpr;
private XPathExpression titleExpr;
private XPathExpression wsdlExpr;
StateOfBiocatalogue() throws Exception
{
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
factory.setCoalescing(true);
factory.setExpandEntityReferences(true);
factory.setValidating(false);
factory.setIgnoringComments(true);
factory.setNamespaceAware(true);
factory.setIgnoringElementContentWhitespace(true);
this.builder=factory.newDocumentBuilder();
XPathFactory xpathFactory=XPathFactory.newInstance();
XPath xpath=xpathFactory.newXPath();
NamespaceContextImpl nsContext=new NamespaceContextImpl();
nsContext.setPrefixURI("rest", "http://www.biocatalogue.org/2009/xml/rest");
nsContext.setPrefixURI("xlink", "http://www.w3.org/1999/xlink");
nsContext.setPrefixURI("dc", "http://purl.org/dc/elements/1.1/");
xpath.setNamespaceContext(nsContext);
this.servicesExpr= xpath.compile("/rest:services/rest:results/rest:service/@xlink:href");
this.titleExpr= xpath.compile("/rest:service/dc:title");
this.wsdlExpr= xpath.compile("/rest:service/rest:variants/rest:soapService/rest:wsdlLocation");
}
private int cleanup(File generated)
{
int count=0;
if(!generated.exists()) return 0;
if(generated.isDirectory())
{
for(File f2: generated.listFiles())
{
count+=cleanup(f2);
}
generated.delete();
}
else if(generated.isFile())
{
if(generated.getName().endsWith(".class")) ++count;
generated.delete();
}
return count;
}
private Document parseXML(String url) throws Exception
{
System.err.println(url);
int n_try=0;
Document dom= null;
while(n_try<10)
{
try
{
dom=this.builder.parse(url);
break;
}
catch(Exception err)
{
Thread.sleep(30000);//wait 30 secs
System.err.println("Trying again.... "+url);
n_try++;
continue;
}
}
if(dom==null) throw new IOException("Cannot read "+url);
return dom;
}
private void lookup(PrintWriter out,String url) throws Exception
{
File generated=new File(System.getProperty("java.io.tmpdir","/tmp/"),"generated");
String javaHome= System.getProperty("java.home");
Document dom=null;
try
{
dom=parseXML(url+".xml");
}
catch(Exception err)
{
out.print(url);
out.print("\t");
out.print("[error]");
out.print("\tNO:CONNECTION_FAILED");
out.println();
return;
}
NodeList wsdls=(NodeList)this.wsdlExpr.evaluate(dom, XPathConstants.NODESET);
if(wsdls.getLength()==0)
{
out.print(url);
out.print("\t");
out.print(this.titleExpr.evaluate(dom,XPathConstants.STRING));
out.print("\tNO-WSDL");
out.println();
return;
}
for(int i=0;i< wsdls.getLength();++i)
{
out.print(url);
out.print("\t");
out.print(this.titleExpr.evaluate(dom,XPathConstants.STRING));
out.print("\tYES");
out.print("\t");
String wsdlURI=wsdls.item(i).getTextContent();
out.print(wsdlURI);
generated.mkdir();
String cmdarray[]={javaHome+"/bin/wsimport",
//"-httpproxy:host:port",
"-d",generated.toString(),wsdlURI
};
StringBuilder message=new StringBuilder();
Process proc=Runtime.getRuntime().exec(cmdarray);
System.err.println("run : "+Arrays.toString(cmdarray));
InputStream in=proc.getInputStream();
int c;
while(in!=null && (c=in.read())!=-1)
{
message.append((char)c);
}
int rez=proc.waitFor();
if(in!=null) in.close();
int nClass=cleanup(generated);
if(rez==0 && nClass==0) rez=-1;
out.print("\t"+(rez==0?"SUCCESS":"FAILURE"));
if(rez!=0)
{
out.print("\t");
for(String s:message.toString().replace("\r", "\n").replace('\t', ' ').split("[\n]"))
{
s=s.trim();
if(!s.startsWith("[ERROR]")) continue;
out.print(s);
break;
}
}
else
{
out.print("\t"+(nClass)+" classes");
}
out.println();
}
out.flush();
}
private void run(PrintWriter out) throws Exception
{
int service_page=0;
while(true)
{
service_page++;
Document dom=parseXML("http://www.biocatalogue.org/services.xml?page="+service_page);
NodeList serviceList=(NodeList)this.servicesExpr.evaluate(dom, XPathConstants.NODESET);
for(int i=0;i< serviceList.getLength();++i)
{
lookup(out,Attr.class.cast(serviceList.item(i)).getValue());
}
if(serviceList.getLength()==0) break;
}
System.err.println("\nDone\n");
}
public static void main(String[] args)
{
try {
PrintWriter out=new PrintWriter(new File("biocatalog.xls"));
new StateOfBiocatalogue().run(out);
out.flush();
out.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
}


That's it
Pierre

6 comments:

Egon Willighagen said...

Nice!

Unknown said...

Oh wow, you would hope that kind of testing is done automatically and used for flagging services ! Did you already ask the BioCatalogue guys if they can perform this test automatically and on a regular basis?

Pierre Lindenbaum said...

@Joerg, nice suggestion, I'll suggest it to the BioCatalogue

Jits said...

Hello from a BioCatalogue person,

Interesting findings! And nice use of the BioCatalogue API :-) Very pleased that you provided the source code.

@Joerg, intriguing suggestion! For service monitoring we currently have regular availability checks and run pre-approved test scripts that test the functionality of individual services. Testing for certain toolkits/platforms (in this case, Java/wimport) could have major benefits to users and providers who want to know where the services work or not.

We are planning on allowing service providers etc to host/run tests like these and then publish the test results back to us via an API. This can allow us to aggregate all kinds of monitoring information. (We would mark these as "external tests" and give appropriate credit).

What do you think?

Alternatively, we could possibly wrap tests like these into test scripts and run them on our existing infrastructure.

One question I had: we are wary of ever flagging services as being "dysfunctional", so in the case of Java/wsimport how mature is that stack for web services client work?

Cheers,
Jits

Pierre Lindenbaum said...

@jits : "How mature is that stack for web services client work ". I'm not sure how I can answer this question; wsimport is a 'standard' tool released in the Java SDK. As far as I can see, the EBI generates its web services using wsgen and then, on client side, wsimport can read those WSDL without any problem. Java people are also using some other tools such as apache AXIS, or CXF. I know those tools also fail with parsing the 'old' WSDLs (see http://stackoverflow.com/questions/2479069/importing-a-webservice.

Jits said...

@pierre: thanks for the clarification. I guess the world of SOAP/WSDL parsing is fraught with issues!