04 March 2012

Java Remote Method Invocation (RMI) for Bioinformatics

"Java Remote Method Invocation (Java RMI) enables the programmer to create distributed Java technology-based to Java technology-based applications, in which the methods of remote Java objects can be invoked from other Java virtual machines*, possibly on different hosts. "[Oracle] In the current post a java client will send a java class to the server that will analyze a DNA sequence fetched from the NCBI, using the RMI technology.

Files and directories

I In this example, my files are structured as defined below:
./sandbox/client/FirstBases.java
./sandbox/client/GCPercent.java
./sandbox/client/SequenceAnalyzerClient.java
./sandbox/server/SequenceAnalyzerServiceImpl.java
./sandbox/shared/SequenceAnalyzerService.java
./sandbox/shared/SequenceAnalyzer.java
./client.policy
./server.policy

The Service: SequenceAnalyzerService.java

The remote service provided by the server is defined as an interface named SequenceAnalyzerService: it fetches a DNA sequence for a given NCBI-gi, processes the sequence with an instance of SequenceAnalyzer (see below) and returns a serializable value (that is to say, we can transmit this value through the network).
package sandbox.shared;
import java.rmi.Remote;
import java.rmi.RemoteException;
import java.io.IOException;
import org.xml.sax.SAXException;
public interface SequenceAnalyzerService extends Remote
{
public static final String SERVICE_NAME="efetch";
public java.io.Serializable analyse(int gi,SequenceAnalyzer analyzer) throws IOException,SAXException;
}

Extract a value from a DNA sequence : SequenceAnalyzer

The interface SequenceAnalyzer defines how the remote service should parse a sequence. A SAX Parser will be used by the 'SequenceAnalyzerService' to process a TinySeq-XML document from the NCBI. The method characters is called each time a chunck of sequence is found. At the end, the remote server will return the value calculated from getResult:
package sandbox.shared;
import java.io.Serializable;
public interface SequenceAnalyzer extends Serializable
{
public void characters(char content[],int pos,int length);
public Serializable getResult();
}

Server side : an implementation of SequenceAnalyzerService

The class SequenceAnalyzerServiceImpl is an implementation of the service SequenceAnalyzerService. In the method analyse, a SAXParser is created and the given 'gi' sequence is downloaded from the NCBI. The instance of SequenceAnalyzer received from the client is invoked for each chunck of DNA. At the end, the "value" calculated by the instance of SequenceAnalyzer is returned to the client through the network. The 'main' method contains the code to bind this service to the RMI registry:
package sandbox.server;
import java.rmi.RemoteException;
import java.rmi.registry.LocateRegistry;
import java.rmi.registry.Registry;
import java.rmi.server.UnicastRemoteObject;
import java.io.Serializable;
import java.io.IOException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.SAXException;
import sandbox.shared.*;
public class SequenceAnalyzerServiceImpl implements SequenceAnalyzerService
{
static private class Handler extends DefaultHandler
{
boolean inSeq=false;
SequenceAnalyzer analyzer;
public void startElement(String uri,
String localName,
String qName,
Attributes attributes) throws SAXException
{
if(qName.equals("TSeq_sequence")) inSeq=true;
}
public void characters(char[] ch,
int start,
int length) throws SAXException
{
if(!inSeq) return;
analyzer.characters(ch,start,length);
}
public void endElement(String uri,
String localName,
String qName) throws SAXException
{
inSeq=false;
}
}
public SequenceAnalyzerServiceImpl()
{
}
@Override
public Serializable analyse(int gi,SequenceAnalyzer analyzer) throws IOException,SAXException
{
try
{
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
Handler handler=new Handler();
handler.analyzer=analyzer;
saxParser.parse(
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"+
"db=nucleotide&rettype=fasta&retmode=xml&id="+gi, handler);
return analyzer.getResult();
}
catch(javax.xml.parsers.ParserConfigurationException err)
{
throw new RuntimeException(err);
}
}
public static void main(String[] args)
{
/* protects access to system resources from untrusted downloaded
code running within the Java virtual machine. */
if (System.getSecurityManager() == null)
{
System.setSecurityManager(new SecurityManager());
}
try
{
SequenceAnalyzerService engine = new SequenceAnalyzerServiceImpl();
/* exports the supplied remote object so that it can
receive invocations of its remote methods from remote clients. */
SequenceAnalyzerService stub = (SequenceAnalyzerService) UnicastRemoteObject.exportObject(
engine,
0 //TCP port
);
/* adds the name to the RMI registry running on the server */
Registry registry = LocateRegistry.getRegistry();
registry.rebind(SequenceAnalyzerService.SERVICE_NAME, stub);
System.out.println("SequenceAnalyzerService bound.");
}
catch (Exception e)
{
e.printStackTrace();
}
}
}

Client side

On the client side, we're going to connect to the SequenceAnalyzerService and send two distinct implementations of SequenceAnalyzer. What's interesting here: the server doesn't know anything about those implementations of SequenceAnalyzer. The client's java compiled classes have to be sent to the service.

GCPercent.java

A first implementation of 'SequenceAnalyzer' computes the GC% of a sequence:
package sandbox.client;
import java.io.Serializable;
import sandbox.shared.SequenceAnalyzer;
public class GCPercent implements SequenceAnalyzer
{
private transient double total=0;
private transient double gc=0;
public GCPercent()
{
}
@Override
public void characters(char content[],int pos,int length)
{
total+=length;
for(int i=0;i< length;++i)
{
switch(content[pos+i])
{
case 'G':case 'C':
case 'g':case 'c':
case 's': gc++;
default:break;
}
}
}
@Override
public Serializable getResult()
{
if(total==0) return null;
return total/gc;
}
}
view raw GCPercent.java hosted with ❤ by GitHub

FirstBases

The second implementation of 'SequenceAnalyzer' retrieves the first bases of a sequence.
package sandbox.client;
import java.io.Serializable;
import sandbox.shared.SequenceAnalyzer;
public class FirstBases implements SequenceAnalyzer
{
private int count=0;
private transient StringBuilder sequence=null;
public FirstBases()
{
}
public int getCount()
{
return count;
}
public void setCount(int count)
{
this.count=count;
}
@Override
public void characters(char content[],int pos,int length)
{
if(sequence==null) sequence=new StringBuilder(getCount());
for(int i=0;i< length && sequence.length() < getCount();++i)
{
sequence.append(content[i+pos]);
}
}
@Override
public Serializable getResult()
{
if(sequence==null) return null;
return sequence.toString();
}
}
view raw FirstBases.java hosted with ❤ by GitHub

The Client

And here is the java code for the client. The client connects to the RMI server and invokes 'analyse' with the two instances of SequenceAnalyzer for some NCBI-gi:
package sandbox.client;
import java.rmi.registry.LocateRegistry;
import java.rmi.registry.Registry;
import sandbox.shared.*;
public class SequenceAnalyzerClient
{
public static void main(String args[]) throws Exception
{
/* the process of receiving the server remote object's stub
could require downloading class definitions from the server. */
if (System.getSecurityManager() == null)
{
System.setSecurityManager(new SecurityManager());
}
Registry registry = LocateRegistry.getRegistry(args[0]);
SequenceAnalyzerService comp = (SequenceAnalyzerService) registry.lookup(SequenceAnalyzerService.SERVICE_NAME);
for(int gi=25;gi<30;++gi)
{
GCPercent analyzer1=new GCPercent();
FirstBases analyzer2=new FirstBases();
analyzer2.setCount(5+gi%7);
System.err.println("gi="+gi+" gc%="+comp.analyse(gi,analyzer1));
System.err.println("gi="+gi+" start="+comp.analyse(gi,analyzer2));
}
}
}

A note about security

As the server/client doesn't want to receive some malicious code, we have to use some policy files:
server.policy:
grant {
permission java.security.AllPermission;
};
view raw server.policy hosted with ❤ by GitHub

client.policy:
grant {
permission java.security.AllPermission;
};
view raw client.policy hosted with ❤ by GitHub

Compiling and Running

Compiling the client

javac -cp . sandbox/client/SequenceAnalyzerClient.java

Compiling the server

javac -cp . sandbox/server/SequenceAnalyzerServiceImpl.java

Starting the RMI registry

${JAVA_HOME}/bin/rmiregistry

Starting the SequenceAnalyzerServiceImpl

$ java \
 -Djava.security.policy=server.policy \
 -Djava.rmserver.codebase=file:///path/to/RMI/ \
 -cp . sandbox.server.SequenceAnalyzerServiceImpl

SequenceAnalyzerService bound.

Running the client

$ java  \
 -Djava.rmi.server.codebase=file:///path/to/RMI/ \
 -Djava.security.policy=client.policy  \
 -cp . sandbox.client.SequenceAnalyzerClient  localhost

gi=25 gc%=2.1530612244897958
gi=25 start=TAGTTATTC
gi=26 gc%=2.1443298969072164
gi=26 start=TAGTTATTAA
gi=27 gc%=2.3022222222222224
gi=27 start=AACCAGTATTA
gi=28 gc%=2.376543209876543
gi=28 start=TCGTA
gi=29 gc%=2.2014742014742015
gi=29 start=TCTTTG
That's it, Pierre

No comments: