22 December 2008

Knime.org: creating a new Source Node reading Fasta sequences.

This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.

In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my lab. I tested three tools:

  • Taverna (now in version 2) is mainly devoted to the web services. I've been trying to learn how to use Taverna a few times but I still don't find it user-friendly, not intuitive
  • http://kepler-project.org/: 141Mo ! Ouch !
  • http://www.knime.org: until the last version , the development version crashed at startup on my computer, but now everything works fine

KNime - The Konstanz Information Miner is a modular environment -, which enables easy visual assembly and interactive execution of a data pipeline. Knime is built over eclipse and each node in the workflow is developed as a plugin. As far as I understand Knime, it only handles tabular data and that's why I wrote this new plugin converting a set of fasta files. The Knime SDK comes with a dialog wizard creating a the java stubs required to create a new KNime node. Here are a few files:

implements the logic of the node. The most important method
BufferedDataTable[] execute(finalBufferedDataTable[] inData,
final ExecutionContext exec) throws Exception
takes as input one or more table, transforms it and returns an array of one or more table

A Swing-based dialog used to select the option of a node. Here, I've created a dialog selecting the fasta file

Visualizes the result of the node. Here, I didn't wrote a view but one could imagine a graphical interface drawing the GC%, etc...

A class generating the model, the dialog the views, ...

Describes the plugin. It is only used by eclipse

The sources I wrote are available here: warning, the sources are a draft and I'm still learning the Knime API, I guess there must have cleaver/smarter/safer way to write this stuff
  • FastaIterator.java: iterator over a fasta file
  • FastaTable.java: the tabular representation of the sequences. It contains two columns Name and Sequence
  • ReadFastaNodeDialog.java: the dialog selecting the fasta file
  • ReadFastaNodeFactory.java: the node factory
  • ReadFastaNodeModel.java: implements the logic of the node. Creates and returns the FastaTable
  • /ReadFastaNodePlugin: used by eclipse

package org.lindenb.knime.readfasta;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.PushbackReader;
import org.knime.core.data.DataCell;
import org.knime.core.data.DataRow;
import org.knime.core.data.RowIterator;
import org.knime.core.data.RowKey;
import org.knime.core.data.def.DefaultRow;
import org.knime.core.data.def.StringCell;
public class FastaIterator extends RowIterator
private PushbackReader reader =null;
StringBuilder name=new StringBuilder();
StringBuilder seq=new StringBuilder();
int rowIndex=0;
private boolean has_next_tested=false;
private boolean has_next=false;
FastaIterator(String source)
this.reader = new PushbackReader(new BufferedReader(new FileReader(source)));
catch(IOException err)
public boolean hasNext()
System.err.println("hasNext called");
if(has_next_tested) return has_next;
return has_next;
int c;
if(Character.isWhitespace(c)) continue;
throw new IOException("expected '>'");
if(c=='\n') break;
boolean at_start=true;
System.err.println("Found >"+name);
if(at_start && c=='>')
else if(c=='\n')
if(Character.isWhitespace(c)) continue;
System.err.println("Found :"+seq);
catch(IOException err)
try { reader.close(); reader=null; } catch(IOException err2) {}
System.err.println("name & seq: "+name+" "+seq);
return has_next;
public DataRow next()
if(!has_next_tested) hasNext();
if(!has_next) throw new IllegalStateException("No next Fasta sequence");
DataCell cellName=new StringCell(this.name.toString());
DataCell cellSeq=new StringCell(this.seq.toString());
int index=rowIndex;
return new DefaultRow(RowKey.createRowKey(index),cellName,cellSeq);
package org.lindenb.knime.readfasta;
import org.knime.core.data.DataColumnSpec;
import org.knime.core.data.DataColumnSpecCreator;
import org.knime.core.data.DataRow;
import org.knime.core.data.DataTable;
import org.knime.core.data.DataTableSpec;
import org.knime.core.data.DataType;
import org.knime.core.data.RowIterator;
import org.knime.core.data.def.StringCell;
import org.knime.core.node.ExecutionContext;
import org.knime.core.node.NodeLogger;
public class FastaTable implements DataTable {
// the logger instance
private DataTableSpec dataTableSpec= createDataTableSpec();
private ExecutionContext exec;
private String filename;
public FastaTable(String filename,final ExecutionContext exec)
public static DataTableSpec createDataTableSpec()
DataColumnSpec name= new DataColumnSpecCreator("Name",DataType.getType(StringCell.class)).createSpec();
DataColumnSpec seq= new DataColumnSpecCreator("Sequence",DataType.getType(StringCell.class)).createSpec();
return new DataTableSpec(
public DataTableSpec getDataTableSpec()
return this.dataTableSpec;
public RowIterator iterator()
System.err.println("FastaTable called with filename="+filename);
return new FastaIterator(this.filename);
view raw FastaTable.java hosted with ❤ by GitHub
package org.lindenb.knime.readfasta;
import javax.swing.JFileChooser;
import org.knime.core.node.defaultnodesettings.DefaultNodeSettingsPane;
import org.knime.core.node.defaultnodesettings.DialogComponentFileChooser;
import org.knime.core.node.defaultnodesettings.SettingsModelString;
* <code>NodeDialog</code> for the "ReadFasta" Node.
* Read fasta files
* This node dialog derives from {@link DefaultNodeSettingsPane} which allows
* creation of a simple dialog with standard components. If you need a more
* complex dialog please derive directly from
* {@link org.knime.core.node.NodeDialogPane}.
* @author Pierre Lindenbaum
public class ReadFastaNodeDialog extends DefaultNodeSettingsPane {
* New pane for configuring ReadFasta node dialog.
* This is just a suggestion to demonstrate possible default dialog
* components.
protected ReadFastaNodeDialog() {
addDialogComponent(new DialogComponentFileChooser(
new SettingsModelString(
package org.lindenb.knime.readfasta;
import org.knime.core.node.NodeDialogPane;
import org.knime.core.node.NodeFactory;
import org.knime.core.node.NodeView;
* <code>NodeFactory</code> for the "ReadFasta" Node.
* Read fasta files
* @author Pierre Lindenbaum
public class ReadFastaNodeFactory
extends NodeFactory<ReadFastaNodeModel> {
* {@inheritDoc}
public ReadFastaNodeModel createNodeModel() {
return new ReadFastaNodeModel();
* {@inheritDoc}
public int getNrNodeViews() {
return 0;
* {@inheritDoc}
public NodeView<ReadFastaNodeModel> createNodeView(final int viewIndex,
final ReadFastaNodeModel nodeModel) {
throw new IllegalStateException();
* {@inheritDoc}
public boolean hasDialog() {
return true;
* {@inheritDoc}
public NodeDialogPane createNodeDialogPane() {
return new ReadFastaNodeDialog();
package org.lindenb.knime.readfasta;
import java.io.File;
import java.io.IOException;
import org.knime.core.data.DataCell;
import org.knime.core.data.DataColumnSpec;
import org.knime.core.data.DataColumnSpecCreator;
import org.knime.core.data.DataRow;
import org.knime.core.data.DataTableSpec;
import org.knime.core.data.RowKey;
import org.knime.core.data.def.DefaultRow;
import org.knime.core.data.def.DoubleCell;
import org.knime.core.data.def.IntCell;
import org.knime.core.data.def.StringCell;
import org.knime.core.node.BufferedDataContainer;
import org.knime.core.node.BufferedDataTable;
import org.knime.core.node.CanceledExecutionException;
import org.knime.core.node.defaultnodesettings.SettingsModelIntegerBounded;
import org.knime.core.node.defaultnodesettings.SettingsModelString;
import org.knime.core.node.ExecutionContext;
import org.knime.core.node.ExecutionMonitor;
import org.knime.core.node.InvalidSettingsException;
import org.knime.core.node.NodeLogger;
import org.knime.core.node.NodeModel;
import org.knime.core.node.NodeSettingsRO;
import org.knime.core.node.NodeSettingsWO;
* This is the model implementation of ReadFasta.
* Read fasta files
* @author Pierre Lindenbaum
public class ReadFastaNodeModel extends NodeModel {
/** the settings key which is used to retrieve and
store the settings (from the dialog or from a settings file)
(package visibility to be usable from the dialog). */
static final String CFGKEY_FILE = "fasta.file.name";
// example value: the models count variable filled from the dialog
// and used in the models execution method. The default components of the
// dialog work with "SettingsModels".
private final SettingsModelString fileInput =
new SettingsModelString(CFGKEY_FILE,"");
* Constructor for the node model.
protected ReadFastaNodeModel() {
super(0, 1);
* {@inheritDoc}
protected BufferedDataTable[] execute(final BufferedDataTable[] inData,
final ExecutionContext exec) throws Exception
// TODO do something here
System.err.println("execute called");
String fname=fileInput.getStringValue();
System.err.println("execute called fname="+fname);
FastaTable out = new FastaTable(fname,exec);
return new BufferedDataTable[]{exec.createBufferedDataTable(out, exec)};
* {@inheritDoc}
protected void reset() {
System.err.println("ReadFastaModel: reset called");
* {@inheritDoc}
protected DataTableSpec[] configure(final DataTableSpec[] inSpecs)
throws InvalidSettingsException
System.err.println("ReadFastaModel: configure called");
return new DataTableSpec[]{FastaTable.createDataTableSpec()};
* {@inheritDoc}
protected void saveSettingsTo(final NodeSettingsWO settings)
System.err.println("ReadFastaModel: saveSettingsTo called "+ this.fileInput);
* {@inheritDoc}
protected void loadValidatedSettingsFrom(final NodeSettingsRO settings)
throws InvalidSettingsException {
System.err.println("ReadFastaModel: loadValidatedSettingsFrom called input="+this.fileInput);
* {@inheritDoc}
protected void validateSettings(final NodeSettingsRO settings)
throws InvalidSettingsException
* {@inheritDoc}
protected void loadInternals(final File internDir,
final ExecutionMonitor exec) throws IOException,
CanceledExecutionException {
* {@inheritDoc}
protected void saveInternals(final File internDir,
final ExecutionMonitor exec) throws IOException,
CanceledExecutionException {
/* @(#)$RCSfile$
* $Revision$ $Date$ $Author$
package org.lindenb.knime.readfasta;
import org.eclipse.core.runtime.Plugin;
import org.osgi.framework.BundleContext;
* This is the eclipse bundle activator.
* Note: KNIME node developers probably won't have to do anything in here,
* as this class is only needed by the eclipse platform/plugin mechanism.
* If you want to move/rename this file, make sure to change the plugin.xml
* file in the project root directory accordingly.
* @author Pierre Lindenbaum
public class ReadFastaNodePlugin extends Plugin {
/** Make sure that this *always* matches the ID in plugin.xml. */
public static final String PLUGIN_ID = "org.lindenb.knime.readfasta";
// The shared instance.
private static ReadFastaNodePlugin plugin;
* The constructor.
public ReadFastaNodePlugin() {
plugin = this;
* This method is called upon plug-in activation.
* @param context The OSGI bundle context
* @throws Exception If this plugin could not be started
public void start(final BundleContext context) throws Exception {
* This method is called when the plug-in is stopped.
* @param context The OSGI bundle context
* @throws Exception If this plugin could not be stopped
public void stop(final BundleContext context) throws Exception {
plugin = null;
* Returns the shared instance.
* @return Singleton instance of the Plugin
public static ReadFastaNodePlugin getDefault() {
return plugin;
package org.lindenb.knime.readfasta;
import org.knime.core.node.NodeView;
* <code>NodeView</code> for the "ReadFasta" Node.
* Read fasta files
* @author Pierre Lindenbaum
public class ReadFastaNodeView extends NodeView<ReadFastaNodeModel> {
* Creates a new view.
* @param nodeModel The model (class: {@link ReadFastaNodeModel})
protected ReadFastaNodeView(final ReadFastaNodeModel nodeModel) {
// TODO instantiate the components of the view here.
* {@inheritDoc}
protected void modelChanged() {
// TODO retrieve the new model from your nodemodel and
// update the view.
ReadFastaNodeModel nodeModel =
assert nodeModel != null;
// be aware of a possibly not executed nodeModel! The data you retrieve
// from your nodemodel could be null, emtpy, or invalid in any kind.
* {@inheritDoc}
protected void onClose() {
// TODO things to do when closing the view
* {@inheritDoc}
protected void onOpen() {
// TODO things to do when opening the view

Here is a screenshot: my Node reads a fasta file and transforms it into a two-columns table 'grep' all the sequences containing the word ONCOGENE, sort the sequences and output the result in a table (smaller window at the bottom).

That's it for tonight.
