KNIME is a java+eclipse-based graphical workflow-manager.
Biologists in my lab often use this tool to filter VCFs or other tabular data. A software Development kit (SDK) is provided to build new nodes. My main problem with this SDK is, that you need to write a large number of similar files and you also have to interact with a graphical interface. I wanted to automatize the generation of java code for new node. In the following post I will describe how I wrote two new node for reading and writing fasta files.
The nodes are described in a XML file and the java code is generated with a XSLT stylesheet and is available on github at:
Example
We're going to create two nodes for FASTA:
- a fasta reader
- a fasta writer
We define a plugin.xml file, it uses xinclude to include the definition of the two nodes. The base package of will be com.github.lindenb.xsltsandbox . The nodes will be displayed in the knime-workbench under /community/bio/fasta
<?xml version="1.0" encoding="UTF-8"?> <plugin xmlns:xi="http://www.w3.org/2001/XInclude" package="com.github.lindenb.xsltsandbox" > <category name="bio"> <category name="fasta" label="Fasta"> <xi:include href="node.read-fasta.xml"/> <xi:include href="node.write-fasta.xml"/> </category> </category> </plugin>
node.read-fasta.xml : it takes a FileReader (for the input fasta file ) and an integer to limit the number of fasta sequences to be read. The outpout will be a table with two columns (name/sequence). We only write the code for reading the fasta file.
<?xml version="1.0" encoding="UTF-8"?>
<node name="readfasta" label="Read Fasta" description="Reads a Fasta file">
<property type="file-read" name="fastaIn">
<extension>.fa</extension>
<extension>.fasta</extension>
<extension>.fasta.gz</extension>
<extension>.fa.gz</extension>
</property>
<property type="int" name="limit" label="max sequences" description="number of sequences to be fetch. 0 = ALL" default="0">
</property>
<property type="bool" name="upper" label="Uppercase" description="Convert to Uppercase" default="false">
</property>
<outPort name="output">
<column name="title" label="Title" type="string"/>
<column name="sequence" label="Sequence" type="string"/>
</outPort>
<code>
<import>
import java.io.*;
</import>
<body>
@Override
protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
{
int limit = this.getPropertyLimitValue();
String url = this.getPropertyFastaInValue();
boolean to_upper = this.getPropertyUpperValue();
getLogger().info("reading "+url);
java.io.BufferedReader r= null;
int n_sequences = 0;
try
{
r = this.openUriForBufferedReader(url);
DataTableSpec dataspec0 = this.createOutTableSpec0();
BufferedDataContainer container0 = exec.createDataContainer(dataspec0);
String seqname="";
StringBuilder sequence=new StringBuilder();
for(;;)
{
exec.checkCanceled();
exec.setMessage("Sequences "+n_sequences);
String line= r.readLine();
if(line==null || line.startsWith(">"))
{
if(!(sequence.length()==0 && seqname.trim().isEmpty()))
{
container0.addRowToTable(new org.knime.core.data.def.DefaultRow(
org.knime.core.data.RowKey.createRowKey(n_sequences),
this.createDataCellsForOutTableSpec0(seqname,sequence)
));
++n_sequences;
}
if(line==null) break;
if( limit!=0 && limit==n_sequences) break;
seqname=line.substring(1);
sequence=new StringBuilder();
}
else
{
line= line.trim();
if( to_upper ) line= line.toUpperCase();
sequence.append(line);
}
}
container0.close();
BufferedDataTable out0 = container0.getTable();
return new BufferedDataTable[]{out0};
}
finally
{
r.close();
}
}
</body>
</code>
</node>
node.write-fasta.xml : it needs an input dataTable with two column (name/sequence), an integer to set the lentgh of the lines and requires a file-writer to write the fasta file.
<?xml version="1.0" encoding="UTF-8"?>
<node name="writefasta" label="Write Fasta" description="Write a Fasta file">
<inPort name="input">
</inPort>
<property type="file-save" name="fastaOut">
</property>
<property type="column" name="title" label="Title" description="Fasta title" data-type="string">
</property>
<property type="column" name="sequence" label="Sequence" description="Fasta Sequence" data-type="string">
</property>
<property type="int" name="fold" label="Fold size" description="Fold sequences greater than..." default="60">
</property>
<code>
<import>
import org.knime.core.data.container.CloseableRowIterator;
import java.io.*;
</import>
<body>
@Override
protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
{
CloseableRowIterator iter=null;
BufferedDataTable inTable=inData[0];
int fold = this.getPropertyFoldValue();
int tIndex = this.findTitleRequiredColumnIndex(inTable.getDataTableSpec());
int sIndex = this.findSequenceRequiredColumnIndex(inTable.getDataTableSpec());
PrintWriter w =null;
try
{
w= openFastaOutForPrinting();
int nRows=0;
double total=inTable.getRowCount();
iter=inTable.iterator();
while(iter.hasNext())
{
DataRow row=iter.next();
DataCell tCell =row.getCell(tIndex);
DataCell sCell =row.getCell(sIndex);
w.print(">");
if(!tCell.isMissing())
{
w.print(StringCell.class.cast(tCell).getStringValue());
}
if(!sCell.isMissing())
{
String sequence = StringCell.class.cast(sCell).getStringValue();
for(int i=0;i<sequence.length();++i)
{
if(i%fold == 0) w.println();
w.print(sequence.charAt(i));
exec.checkCanceled();
}
}
w.println();
exec.checkCanceled();
exec.setProgress(nRows/total,"Saving Fasta");
++nRows;
}
w.flush();
return new BufferedDataTable[0];
}
finally
{
if(w!=null) w.close();
}
}
</body>
</code>
</node>
The following Makefile generates the code, compiles and installs the new plugin in the ${knime.root}/plugins
directory :
.PHONY:all clean install run
knime.root=${HOME}/package/knime_2.11.2
all: install
run: install
${knime.root}/knime -clean
install:
rm -rf generated
xsltproc --xinclude \
--stringparam base.dir generated \
knime2java.xsl plugin.xml
$(MAKE) -C generated install knime.root=${knime.root}
clean:
rm -rf generated
The code generated by this Makefile:
$ find generated/ -type f generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeFactory.xml generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodePlugin.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeFactory.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeDialog.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/AbstractReadfastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeFactory.xml generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodePlugin.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeFactory.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeDialog.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/AbstractWritefastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/CompileAll__.java generated/src/com/github/lindenb/xsltsandbox/AbstractNodeModel.java generated/MANIFEST.MF generated/Makefile generated/plugin.xml generated/dist/com_github_lindenb_xsltsandbox.jar generated/dist/com.github.lindenb.xsltsandbox_2015.02.18.jar
The file generated/dist/com.github.lindenb.xsltsandbox_2015.02.18.jar is the file to move to ${knime.root}/plugins
(At the time of writing I put the jar at http://cardioserve.nantes.inserm.fr/~lindenb/knime/fasta/ )
open knime, the new nodes are now displayed in the Node Repository
You can now use the nodes, the code is displayed in the documentation:
That's it,
Pierre
No comments:
Post a Comment