This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.
In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my lab. I tested three tools:
- Taverna (now in version 2) is mainly devoted to the web services. I've been trying to learn how to use Taverna a few times but I still don't find it user-friendly, not intuitive
- http://kepler-project.org/: 141Mo ! Ouch !
- http://www.knime.org: until the last version , the development version crashed at startup on my computer, but now everything works fine
KNime - The Konstanz Information Miner is a modular environment -, which enables easy visual assembly and interactive execution of a data pipeline. Knime is built over eclipse and each node in the workflow is developed as a plugin. As far as I understand Knime, it only handles tabular data and that's why I wrote this new plugin converting a set of fasta files. The Knime SDK comes with a dialog wizard creating a the java stubs required to create a new KNime node. Here are a few files:
- implements the logic of the node. The most important method
BufferedDataTable execute(finalBufferedDataTable inData,takes as input one or more table, transforms it and returns an array of one or more table
final ExecutionContext exec) throws Exception
- A Swing-based dialog used to select the option of a node. Here, I've created a dialog selecting the fasta file
- Visualizes the result of the node. Here, I didn't wrote a view but one could imagine a graphical interface drawing the GC%, etc...
- A class generating the model, the dialog the views, ...
- Describes the plugin. It is only used by eclipse
The sources I wrote are available here: warning, the sources are a draft and I'm still learning the Knime API, I guess there must have cleaver/smarter/safer way to write this stuff
- FastaIterator.java: iterator over a fasta file
- FastaTable.java: the tabular representation of the sequences. It contains two columns Name and Sequence
- ReadFastaNodeDialog.java: the dialog selecting the fasta file
- ReadFastaNodeFactory.java: the node factory
- ReadFastaNodeModel.java: implements the logic of the node. Creates and returns the FastaTable
- /ReadFastaNodePlugin: used by eclipse
Here is a screenshot: my Node reads a fasta file and transforms it into a two-columns table 'grep' all the sequences containing the word ONCOGENE, sort the sequences and output the result in a table (smaller window at the bottom).
That's it for tonight.