22 December 2008

Knime.org: creating a new Source Node reading Fasta sequences.

This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.

In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my lab. I tested three tools:

  • Taverna (now in version 2) is mainly devoted to the web services. I've been trying to learn how to use Taverna a few times but I still don't find it user-friendly, not intuitive
  • http://kepler-project.org/: 141Mo ! Ouch !
  • http://www.knime.org: until the last version , the development version crashed at startup on my computer, but now everything works fine


KNime - The Konstanz Information Miner is a modular environment -, which enables easy visual assembly and interactive execution of a data pipeline. Knime is built over eclipse and each node in the workflow is developed as a plugin. As far as I understand Knime, it only handles tabular data and that's why I wrote this new plugin converting a set of fasta files. The Knime SDK comes with a dialog wizard creating a the java stubs required to create a new KNime node. Here are a few files:


XXXNodeModel
implements the logic of the node. The most important method
BufferedDataTable[] execute(finalBufferedDataTable[] inData,
final ExecutionContext exec) throws Exception
takes as input one or more table, transforms it and returns an array of one or more table

XXXNodeDialog
A Swing-based dialog used to select the option of a node. Here, I've created a dialog selecting the fasta file

XXXNodeView
Visualizes the result of the node. Here, I didn't wrote a view but one could imagine a graphical interface drawing the GC%, etc...

XXXNodeFactory
A class generating the model, the dialog the views, ...

XXXNodePlugin
Describes the plugin. It is only used by eclipse



The sources I wrote are available here: warning, the sources are a draft and I'm still learning the Knime API, I guess there must have cleaver/smarter/safer way to write this stuff
  • FastaIterator.java: iterator over a fasta file
  • FastaTable.java: the tabular representation of the sequences. It contains two columns Name and Sequence
  • ReadFastaNodeDialog.java: the dialog selecting the fasta file
  • ReadFastaNodeFactory.java: the node factory
  • ReadFastaNodeModel.java: implements the logic of the node. Creates and returns the FastaTable
  • /ReadFastaNodePlugin: used by eclipse

package org.lindenb.knime.readfasta;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.PushbackReader;
import org.knime.core.data.DataCell;
import org.knime.core.data.DataRow;
import org.knime.core.data.RowIterator;
import org.knime.core.data.RowKey;
import org.knime.core.data.def.DefaultRow;
import org.knime.core.data.def.StringCell;
public class FastaIterator extends RowIterator
{
private PushbackReader reader =null;
StringBuilder name=new StringBuilder();
StringBuilder seq=new StringBuilder();
int rowIndex=0;
private boolean has_next_tested=false;
private boolean has_next=false;
FastaIterator(String source)
{
try
{
this.reader = new PushbackReader(new BufferedReader(new FileReader(source)));
}
catch(IOException err)
{
this.reader=null;
err.printStackTrace();
}
}
@Override
public boolean hasNext()
{
System.err.println("hasNext called");
if(has_next_tested) return has_next;
has_next_tested=true;
has_next=false;
if(reader==null)
{
return has_next;
}
try
{
int c;
while((c=reader.read())!=-1)
{
if(Character.isWhitespace(c)) continue;
break;
}
if(c!='>')
{
throw new IOException("expected '>'");
}
while((c=reader.read())!=-1)
{
if(c=='\n') break;
name.append((char)c);
}
boolean at_start=true;
System.err.println("Found >"+name);
while((c=reader.read())!=-1)
{
if(at_start && c=='>')
{
reader.unread(c);
break;
}
else if(c=='\n')
{
at_start=true;
continue;
}
at_start=false;
if(Character.isWhitespace(c)) continue;
seq.append((char)c);
}
System.err.println("Found :"+seq);
}
catch(IOException err)
{
try { reader.close(); reader=null; } catch(IOException err2) {}
}
System.err.println("name & seq: "+name+" "+seq);
has_next=name.length()>0;
return has_next;
}
@Override
public DataRow next()
{
if(!has_next_tested) hasNext();
if(!has_next) throw new IllegalStateException("No next Fasta sequence");
has_next_tested=false;
has_next=false;
DataCell cellName=new StringCell(this.name.toString());
DataCell cellSeq=new StringCell(this.seq.toString());
this.name.setLength(0);
this.seq.setLength(0);
int index=rowIndex;
++rowIndex;
return new DefaultRow(RowKey.createRowKey(index),cellName,cellSeq);
}
}
package org.lindenb.knime.readfasta;
import org.knime.core.data.DataColumnSpec;
import org.knime.core.data.DataColumnSpecCreator;
import org.knime.core.data.DataRow;
import org.knime.core.data.DataTable;
import org.knime.core.data.DataTableSpec;
import org.knime.core.data.DataType;
import org.knime.core.data.RowIterator;
import org.knime.core.data.def.StringCell;
import org.knime.core.node.ExecutionContext;
import org.knime.core.node.NodeLogger;
public class FastaTable implements DataTable {
// the logger instance
private DataTableSpec dataTableSpec= createDataTableSpec();
private ExecutionContext exec;
private String filename;
public FastaTable(String filename,final ExecutionContext exec)
{
this.filename=filename;
this.exec=exec;
}
public static DataTableSpec createDataTableSpec()
{
DataColumnSpec name= new DataColumnSpecCreator("Name",DataType.getType(StringCell.class)).createSpec();
DataColumnSpec seq= new DataColumnSpecCreator("Sequence",DataType.getType(StringCell.class)).createSpec();
return new DataTableSpec(
name,seq
);
}
@Override
public DataTableSpec getDataTableSpec()
{
return this.dataTableSpec;
}
@Override
public RowIterator iterator()
{
System.err.println("FastaTable called with filename="+filename);
return new FastaIterator(this.filename);
}
}
view raw FastaTable.java hosted with ❤ by GitHub
package org.lindenb.knime.readfasta;
import javax.swing.JFileChooser;
import org.knime.core.node.defaultnodesettings.DefaultNodeSettingsPane;
import org.knime.core.node.defaultnodesettings.DialogComponentFileChooser;
import org.knime.core.node.defaultnodesettings.SettingsModelString;
/**
* <code>NodeDialog</code> for the "ReadFasta" Node.
* Read fasta files
*
* This node dialog derives from {@link DefaultNodeSettingsPane} which allows
* creation of a simple dialog with standard components. If you need a more
* complex dialog please derive directly from
* {@link org.knime.core.node.NodeDialogPane}.
*
* @author Pierre Lindenbaum
*/
public class ReadFastaNodeDialog extends DefaultNodeSettingsPane {
/**
* New pane for configuring ReadFasta node dialog.
* This is just a suggestion to demonstrate possible default dialog
* components.
*/
protected ReadFastaNodeDialog() {
super();
addDialogComponent(new DialogComponentFileChooser(
new SettingsModelString(
ReadFastaNodeModel.CFGKEY_FILE,
""
),
ReadFastaNodeModel.CFGKEY_FILE,
JFileChooser.OPEN_DIALOG,
".fa",".fasta",".txt"));
}
}
package org.lindenb.knime.readfasta;
import org.knime.core.node.NodeDialogPane;
import org.knime.core.node.NodeFactory;
import org.knime.core.node.NodeView;
/**
* <code>NodeFactory</code> for the "ReadFasta" Node.
* Read fasta files
*
* @author Pierre Lindenbaum
*/
public class ReadFastaNodeFactory
extends NodeFactory<ReadFastaNodeModel> {
/**
* {@inheritDoc}
*/
@Override
public ReadFastaNodeModel createNodeModel() {
return new ReadFastaNodeModel();
}
/**
* {@inheritDoc}
*/
@Override
public int getNrNodeViews() {
return 0;
}
/**
* {@inheritDoc}
*/
@Override
public NodeView<ReadFastaNodeModel> createNodeView(final int viewIndex,
final ReadFastaNodeModel nodeModel) {
throw new IllegalStateException();
}
/**
* {@inheritDoc}
*/
@Override
public boolean hasDialog() {
return true;
}
/**
* {@inheritDoc}
*/
@Override
public NodeDialogPane createNodeDialogPane() {
return new ReadFastaNodeDialog();
}
}
package org.lindenb.knime.readfasta;
import java.io.File;
import java.io.IOException;
import org.knime.core.data.DataCell;
import org.knime.core.data.DataColumnSpec;
import org.knime.core.data.DataColumnSpecCreator;
import org.knime.core.data.DataRow;
import org.knime.core.data.DataTableSpec;
import org.knime.core.data.RowKey;
import org.knime.core.data.def.DefaultRow;
import org.knime.core.data.def.DoubleCell;
import org.knime.core.data.def.IntCell;
import org.knime.core.data.def.StringCell;
import org.knime.core.node.BufferedDataContainer;
import org.knime.core.node.BufferedDataTable;
import org.knime.core.node.CanceledExecutionException;
import org.knime.core.node.defaultnodesettings.SettingsModelIntegerBounded;
import org.knime.core.node.defaultnodesettings.SettingsModelString;
import org.knime.core.node.ExecutionContext;
import org.knime.core.node.ExecutionMonitor;
import org.knime.core.node.InvalidSettingsException;
import org.knime.core.node.NodeLogger;
import org.knime.core.node.NodeModel;
import org.knime.core.node.NodeSettingsRO;
import org.knime.core.node.NodeSettingsWO;
/**
* This is the model implementation of ReadFasta.
* Read fasta files
*
* @author Pierre Lindenbaum
*/
public class ReadFastaNodeModel extends NodeModel {
/** the settings key which is used to retrieve and
store the settings (from the dialog or from a settings file)
(package visibility to be usable from the dialog). */
static final String CFGKEY_FILE = "fasta.file.name";
// example value: the models count variable filled from the dialog
// and used in the models execution method. The default components of the
// dialog work with "SettingsModels".
private final SettingsModelString fileInput =
new SettingsModelString(CFGKEY_FILE,"");
/**
* Constructor for the node model.
*/
protected ReadFastaNodeModel() {
super(0, 1);
}
/**
* {@inheritDoc}
*/
@Override
protected BufferedDataTable[] execute(final BufferedDataTable[] inData,
final ExecutionContext exec) throws Exception
{
// TODO do something here
System.err.println("execute called");
String fname=fileInput.getStringValue();
System.err.println("execute called fname="+fname);
FastaTable out = new FastaTable(fname,exec);
return new BufferedDataTable[]{exec.createBufferedDataTable(out, exec)};
}
/**
* {@inheritDoc}
*/
@Override
protected void reset() {
System.err.println("ReadFastaModel: reset called");
}
/**
* {@inheritDoc}
*/
@Override
protected DataTableSpec[] configure(final DataTableSpec[] inSpecs)
throws InvalidSettingsException
{
System.err.println("ReadFastaModel: configure called");
return new DataTableSpec[]{FastaTable.createDataTableSpec()};
}
/**
* {@inheritDoc}
*/
@Override
protected void saveSettingsTo(final NodeSettingsWO settings)
{
System.err.println("ReadFastaModel: saveSettingsTo called "+ this.fileInput);
this.fileInput.saveSettingsTo(settings);
}
/**
* {@inheritDoc}
*/
@Override
protected void loadValidatedSettingsFrom(final NodeSettingsRO settings)
throws InvalidSettingsException {
this.fileInput.loadSettingsFrom(settings);
System.err.println("ReadFastaModel: loadValidatedSettingsFrom called input="+this.fileInput);
}
/**
* {@inheritDoc}
*/
@Override
protected void validateSettings(final NodeSettingsRO settings)
throws InvalidSettingsException
{
this.fileInput.validateSettings(settings);
}
/**
* {@inheritDoc}
*/
@Override
protected void loadInternals(final File internDir,
final ExecutionMonitor exec) throws IOException,
CanceledExecutionException {
}
/**
* {@inheritDoc}
*/
@Override
protected void saveInternals(final File internDir,
final ExecutionMonitor exec) throws IOException,
CanceledExecutionException {
}
}
/* @(#)$RCSfile$
* $Revision$ $Date$ $Author$
*
*/
package org.lindenb.knime.readfasta;
import org.eclipse.core.runtime.Plugin;
import org.osgi.framework.BundleContext;
/**
* This is the eclipse bundle activator.
* Note: KNIME node developers probably won't have to do anything in here,
* as this class is only needed by the eclipse platform/plugin mechanism.
* If you want to move/rename this file, make sure to change the plugin.xml
* file in the project root directory accordingly.
*
* @author Pierre Lindenbaum
*/
public class ReadFastaNodePlugin extends Plugin {
/** Make sure that this *always* matches the ID in plugin.xml. */
public static final String PLUGIN_ID = "org.lindenb.knime.readfasta";
// The shared instance.
private static ReadFastaNodePlugin plugin;
/**
* The constructor.
*/
public ReadFastaNodePlugin() {
super();
plugin = this;
}
/**
* This method is called upon plug-in activation.
*
* @param context The OSGI bundle context
* @throws Exception If this plugin could not be started
*/
@Override
public void start(final BundleContext context) throws Exception {
super.start(context);
}
/**
* This method is called when the plug-in is stopped.
*
* @param context The OSGI bundle context
* @throws Exception If this plugin could not be stopped
*/
@Override
public void stop(final BundleContext context) throws Exception {
super.stop(context);
plugin = null;
}
/**
* Returns the shared instance.
*
* @return Singleton instance of the Plugin
*/
public static ReadFastaNodePlugin getDefault() {
return plugin;
}
}
package org.lindenb.knime.readfasta;
import org.knime.core.node.NodeView;
/**
* <code>NodeView</code> for the "ReadFasta" Node.
* Read fasta files
*
* @author Pierre Lindenbaum
*/
public class ReadFastaNodeView extends NodeView<ReadFastaNodeModel> {
/**
* Creates a new view.
*
* @param nodeModel The model (class: {@link ReadFastaNodeModel})
*/
protected ReadFastaNodeView(final ReadFastaNodeModel nodeModel) {
super(nodeModel);
// TODO instantiate the components of the view here.
}
/**
* {@inheritDoc}
*/
@Override
protected void modelChanged() {
// TODO retrieve the new model from your nodemodel and
// update the view.
ReadFastaNodeModel nodeModel =
(ReadFastaNodeModel)getNodeModel();
assert nodeModel != null;
// be aware of a possibly not executed nodeModel! The data you retrieve
// from your nodemodel could be null, emtpy, or invalid in any kind.
}
/**
* {@inheritDoc}
*/
@Override
protected void onClose() {
// TODO things to do when closing the view
}
/**
* {@inheritDoc}
*/
@Override
protected void onOpen() {
// TODO things to do when opening the view
}
}

Here is a screenshot: my Node reads a fasta file and transforms it into a two-columns table 'grep' all the sequences containing the word ONCOGENE, sort the sequences and output the result in a table (smaller window at the bottom).







That's it for tonight.

Pierre