Knime.org: creating a new Source Node reading Fasta sequences.
This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.
In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my lab. I tested three tools:
- Taverna (now in version 2) is mainly devoted to the web services. I've been trying to learn how to use Taverna a few times but I still don't find it user-friendly, not intuitive
- http://kepler-project.org/: 141Mo ! Ouch !
- http://www.knime.org: until the last version , the development version crashed at startup on my computer, but now everything works fine
KNime - The Konstanz Information Miner is a modular environment -, which enables easy visual assembly and interactive execution of a data pipeline. Knime is built over eclipse and each node in the workflow is developed as a plugin. As far as I understand Knime, it only handles tabular data and that's why I wrote this new plugin converting a set of fasta files. The Knime SDK comes with a dialog wizard creating a the java stubs required to create a new KNime node. Here are a few files:
- XXXNodeModel
- implements the logic of the node. The most important method
BufferedDataTable[] execute(finalBufferedDataTable[] inData,
takes as input one or more table, transforms it and returns an array of one or more table
final ExecutionContext exec) throws Exception - XXXNodeDialog
- A Swing-based dialog used to select the option of a node. Here, I've created a dialog selecting the fasta file
- XXXNodeView
- Visualizes the result of the node. Here, I didn't wrote a view but one could imagine a graphical interface drawing the GC%, etc...
- XXXNodeFactory
- A class generating the model, the dialog the views, ...
- XXXNodePlugin
- Describes the plugin. It is only used by eclipse
The sources I wrote are available here: warning, the sources are a draft and I'm still learning the Knime API, I guess there must have cleaver/smarter/safer way to write this stuff
- FastaIterator.java: iterator over a fasta file
- FastaTable.java: the tabular representation of the sequences. It contains two columns Name and Sequence
- ReadFastaNodeDialog.java: the dialog selecting the fasta file
- ReadFastaNodeFactory.java: the node factory
- ReadFastaNodeModel.java: implements the logic of the node. Creates and returns the FastaTable
- /ReadFastaNodePlugin: used by eclipse
Here is a screenshot: my Node reads a fasta file and transforms it into a two-columns table 'grep' all the sequences containing the word ONCOGENE, sort the sequences and output the result in a table (smaller window at the bottom).
That's it for tonight.
Pierre