22 December 2008

Knime.org: creating a new Source Node reading Fasta sequences.

This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.

In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my lab. I tested three tools:

  • Taverna (now in version 2) is mainly devoted to the web services. I've been trying to learn how to use Taverna a few times but I still don't find it user-friendly, not intuitive
  • http://kepler-project.org/: 141Mo ! Ouch !
  • http://www.knime.org: until the last version , the development version crashed at startup on my computer, but now everything works fine

KNime - The Konstanz Information Miner is a modular environment -, which enables easy visual assembly and interactive execution of a data pipeline. Knime is built over eclipse and each node in the workflow is developed as a plugin. As far as I understand Knime, it only handles tabular data and that's why I wrote this new plugin converting a set of fasta files. The Knime SDK comes with a dialog wizard creating a the java stubs required to create a new KNime node. Here are a few files:

implements the logic of the node. The most important method
BufferedDataTable[] execute(finalBufferedDataTable[] inData,
final ExecutionContext exec) throws Exception
takes as input one or more table, transforms it and returns an array of one or more table

A Swing-based dialog used to select the option of a node. Here, I've created a dialog selecting the fasta file

Visualizes the result of the node. Here, I didn't wrote a view but one could imagine a graphical interface drawing the GC%, etc...

A class generating the model, the dialog the views, ...

Describes the plugin. It is only used by eclipse

The sources I wrote are available here: warning, the sources are a draft and I'm still learning the Knime API, I guess there must have cleaver/smarter/safer way to write this stuff
  • FastaIterator.java: iterator over a fasta file
  • FastaTable.java: the tabular representation of the sequences. It contains two columns Name and Sequence
  • ReadFastaNodeDialog.java: the dialog selecting the fasta file
  • ReadFastaNodeFactory.java: the node factory
  • ReadFastaNodeModel.java: implements the logic of the node. Creates and returns the FastaTable
  • /ReadFastaNodePlugin: used by eclipse

Here is a screenshot: my Node reads a fasta file and transforms it into a two-columns table 'grep' all the sequences containing the word ONCOGENE, sort the sequences and output the result in a table (smaller window at the bottom).

That's it for tonight.