Knime.org: creating a new Source Node reading Fasta sequences.
This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.
In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my lab. I tested three tools:
- Taverna (now in version 2) is mainly devoted to the web services. I've been trying to learn how to use Taverna a few times but I still don't find it user-friendly, not intuitive
- http://kepler-project.org/: 141Mo ! Ouch !
- http://www.knime.org: until the last version , the development version crashed at startup on my computer, but now everything works fine
KNime - The Konstanz Information Miner is a modular environment -, which enables easy visual assembly and interactive execution of a data pipeline. Knime is built over eclipse and each node in the workflow is developed as a plugin. As far as I understand Knime, it only handles tabular data and that's why I wrote this new plugin converting a set of fasta files. The Knime SDK comes with a dialog wizard creating a the java stubs required to create a new KNime node. Here are a few files:
- XXXNodeModel
- implements the logic of the node. The most important method
BufferedDataTable[] execute(finalBufferedDataTable[] inData,
takes as input one or more table, transforms it and returns an array of one or more table
final ExecutionContext exec) throws Exception - XXXNodeDialog
- A Swing-based dialog used to select the option of a node. Here, I've created a dialog selecting the fasta file
- XXXNodeView
- Visualizes the result of the node. Here, I didn't wrote a view but one could imagine a graphical interface drawing the GC%, etc...
- XXXNodeFactory
- A class generating the model, the dialog the views, ...
- XXXNodePlugin
- Describes the plugin. It is only used by eclipse
The sources I wrote are available here: warning, the sources are a draft and I'm still learning the Knime API, I guess there must have cleaver/smarter/safer way to write this stuff
- FastaIterator.java: iterator over a fasta file
- FastaTable.java: the tabular representation of the sequences. It contains two columns Name and Sequence
- ReadFastaNodeDialog.java: the dialog selecting the fasta file
- ReadFastaNodeFactory.java: the node factory
- ReadFastaNodeModel.java: implements the logic of the node. Creates and returns the FastaTable
- /ReadFastaNodePlugin: used by eclipse
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.lindenb.knime.readfasta; | |
import java.io.BufferedReader; | |
import java.io.FileReader; | |
import java.io.IOException; | |
import java.io.PushbackReader; | |
import org.knime.core.data.DataCell; | |
import org.knime.core.data.DataRow; | |
import org.knime.core.data.RowIterator; | |
import org.knime.core.data.RowKey; | |
import org.knime.core.data.def.DefaultRow; | |
import org.knime.core.data.def.StringCell; | |
public class FastaIterator extends RowIterator | |
{ | |
private PushbackReader reader =null; | |
StringBuilder name=new StringBuilder(); | |
StringBuilder seq=new StringBuilder(); | |
int rowIndex=0; | |
private boolean has_next_tested=false; | |
private boolean has_next=false; | |
FastaIterator(String source) | |
{ | |
try | |
{ | |
this.reader = new PushbackReader(new BufferedReader(new FileReader(source))); | |
} | |
catch(IOException err) | |
{ | |
this.reader=null; | |
err.printStackTrace(); | |
} | |
} | |
@Override | |
public boolean hasNext() | |
{ | |
System.err.println("hasNext called"); | |
if(has_next_tested) return has_next; | |
has_next_tested=true; | |
has_next=false; | |
if(reader==null) | |
{ | |
return has_next; | |
} | |
try | |
{ | |
int c; | |
while((c=reader.read())!=-1) | |
{ | |
if(Character.isWhitespace(c)) continue; | |
break; | |
} | |
if(c!='>') | |
{ | |
throw new IOException("expected '>'"); | |
} | |
while((c=reader.read())!=-1) | |
{ | |
if(c=='\n') break; | |
name.append((char)c); | |
} | |
boolean at_start=true; | |
System.err.println("Found >"+name); | |
while((c=reader.read())!=-1) | |
{ | |
if(at_start && c=='>') | |
{ | |
reader.unread(c); | |
break; | |
} | |
else if(c=='\n') | |
{ | |
at_start=true; | |
continue; | |
} | |
at_start=false; | |
if(Character.isWhitespace(c)) continue; | |
seq.append((char)c); | |
} | |
System.err.println("Found :"+seq); | |
} | |
catch(IOException err) | |
{ | |
try { reader.close(); reader=null; } catch(IOException err2) {} | |
} | |
System.err.println("name & seq: "+name+" "+seq); | |
has_next=name.length()>0; | |
return has_next; | |
} | |
@Override | |
public DataRow next() | |
{ | |
if(!has_next_tested) hasNext(); | |
if(!has_next) throw new IllegalStateException("No next Fasta sequence"); | |
has_next_tested=false; | |
has_next=false; | |
DataCell cellName=new StringCell(this.name.toString()); | |
DataCell cellSeq=new StringCell(this.seq.toString()); | |
this.name.setLength(0); | |
this.seq.setLength(0); | |
int index=rowIndex; | |
++rowIndex; | |
return new DefaultRow(RowKey.createRowKey(index),cellName,cellSeq); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.lindenb.knime.readfasta; | |
import org.knime.core.data.DataColumnSpec; | |
import org.knime.core.data.DataColumnSpecCreator; | |
import org.knime.core.data.DataRow; | |
import org.knime.core.data.DataTable; | |
import org.knime.core.data.DataTableSpec; | |
import org.knime.core.data.DataType; | |
import org.knime.core.data.RowIterator; | |
import org.knime.core.data.def.StringCell; | |
import org.knime.core.node.ExecutionContext; | |
import org.knime.core.node.NodeLogger; | |
public class FastaTable implements DataTable { | |
// the logger instance | |
private DataTableSpec dataTableSpec= createDataTableSpec(); | |
private ExecutionContext exec; | |
private String filename; | |
public FastaTable(String filename,final ExecutionContext exec) | |
{ | |
this.filename=filename; | |
this.exec=exec; | |
} | |
public static DataTableSpec createDataTableSpec() | |
{ | |
DataColumnSpec name= new DataColumnSpecCreator("Name",DataType.getType(StringCell.class)).createSpec(); | |
DataColumnSpec seq= new DataColumnSpecCreator("Sequence",DataType.getType(StringCell.class)).createSpec(); | |
return new DataTableSpec( | |
name,seq | |
); | |
} | |
@Override | |
public DataTableSpec getDataTableSpec() | |
{ | |
return this.dataTableSpec; | |
} | |
@Override | |
public RowIterator iterator() | |
{ | |
System.err.println("FastaTable called with filename="+filename); | |
return new FastaIterator(this.filename); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.lindenb.knime.readfasta; | |
import javax.swing.JFileChooser; | |
import org.knime.core.node.defaultnodesettings.DefaultNodeSettingsPane; | |
import org.knime.core.node.defaultnodesettings.DialogComponentFileChooser; | |
import org.knime.core.node.defaultnodesettings.SettingsModelString; | |
/** | |
* <code>NodeDialog</code> for the "ReadFasta" Node. | |
* Read fasta files | |
* | |
* This node dialog derives from {@link DefaultNodeSettingsPane} which allows | |
* creation of a simple dialog with standard components. If you need a more | |
* complex dialog please derive directly from | |
* {@link org.knime.core.node.NodeDialogPane}. | |
* | |
* @author Pierre Lindenbaum | |
*/ | |
public class ReadFastaNodeDialog extends DefaultNodeSettingsPane { | |
/** | |
* New pane for configuring ReadFasta node dialog. | |
* This is just a suggestion to demonstrate possible default dialog | |
* components. | |
*/ | |
protected ReadFastaNodeDialog() { | |
super(); | |
addDialogComponent(new DialogComponentFileChooser( | |
new SettingsModelString( | |
ReadFastaNodeModel.CFGKEY_FILE, | |
"" | |
), | |
ReadFastaNodeModel.CFGKEY_FILE, | |
JFileChooser.OPEN_DIALOG, | |
".fa",".fasta",".txt")); | |
} | |
} | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.lindenb.knime.readfasta; | |
import org.knime.core.node.NodeDialogPane; | |
import org.knime.core.node.NodeFactory; | |
import org.knime.core.node.NodeView; | |
/** | |
* <code>NodeFactory</code> for the "ReadFasta" Node. | |
* Read fasta files | |
* | |
* @author Pierre Lindenbaum | |
*/ | |
public class ReadFastaNodeFactory | |
extends NodeFactory<ReadFastaNodeModel> { | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
public ReadFastaNodeModel createNodeModel() { | |
return new ReadFastaNodeModel(); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
public int getNrNodeViews() { | |
return 0; | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
public NodeView<ReadFastaNodeModel> createNodeView(final int viewIndex, | |
final ReadFastaNodeModel nodeModel) { | |
throw new IllegalStateException(); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
public boolean hasDialog() { | |
return true; | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
public NodeDialogPane createNodeDialogPane() { | |
return new ReadFastaNodeDialog(); | |
} | |
} | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.lindenb.knime.readfasta; | |
import java.io.File; | |
import java.io.IOException; | |
import org.knime.core.data.DataCell; | |
import org.knime.core.data.DataColumnSpec; | |
import org.knime.core.data.DataColumnSpecCreator; | |
import org.knime.core.data.DataRow; | |
import org.knime.core.data.DataTableSpec; | |
import org.knime.core.data.RowKey; | |
import org.knime.core.data.def.DefaultRow; | |
import org.knime.core.data.def.DoubleCell; | |
import org.knime.core.data.def.IntCell; | |
import org.knime.core.data.def.StringCell; | |
import org.knime.core.node.BufferedDataContainer; | |
import org.knime.core.node.BufferedDataTable; | |
import org.knime.core.node.CanceledExecutionException; | |
import org.knime.core.node.defaultnodesettings.SettingsModelIntegerBounded; | |
import org.knime.core.node.defaultnodesettings.SettingsModelString; | |
import org.knime.core.node.ExecutionContext; | |
import org.knime.core.node.ExecutionMonitor; | |
import org.knime.core.node.InvalidSettingsException; | |
import org.knime.core.node.NodeLogger; | |
import org.knime.core.node.NodeModel; | |
import org.knime.core.node.NodeSettingsRO; | |
import org.knime.core.node.NodeSettingsWO; | |
/** | |
* This is the model implementation of ReadFasta. | |
* Read fasta files | |
* | |
* @author Pierre Lindenbaum | |
*/ | |
public class ReadFastaNodeModel extends NodeModel { | |
/** the settings key which is used to retrieve and | |
store the settings (from the dialog or from a settings file) | |
(package visibility to be usable from the dialog). */ | |
static final String CFGKEY_FILE = "fasta.file.name"; | |
// example value: the models count variable filled from the dialog | |
// and used in the models execution method. The default components of the | |
// dialog work with "SettingsModels". | |
private final SettingsModelString fileInput = | |
new SettingsModelString(CFGKEY_FILE,""); | |
/** | |
* Constructor for the node model. | |
*/ | |
protected ReadFastaNodeModel() { | |
super(0, 1); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected BufferedDataTable[] execute(final BufferedDataTable[] inData, | |
final ExecutionContext exec) throws Exception | |
{ | |
// TODO do something here | |
System.err.println("execute called"); | |
String fname=fileInput.getStringValue(); | |
System.err.println("execute called fname="+fname); | |
FastaTable out = new FastaTable(fname,exec); | |
return new BufferedDataTable[]{exec.createBufferedDataTable(out, exec)}; | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void reset() { | |
System.err.println("ReadFastaModel: reset called"); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) | |
throws InvalidSettingsException | |
{ | |
System.err.println("ReadFastaModel: configure called"); | |
return new DataTableSpec[]{FastaTable.createDataTableSpec()}; | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void saveSettingsTo(final NodeSettingsWO settings) | |
{ | |
System.err.println("ReadFastaModel: saveSettingsTo called "+ this.fileInput); | |
this.fileInput.saveSettingsTo(settings); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void loadValidatedSettingsFrom(final NodeSettingsRO settings) | |
throws InvalidSettingsException { | |
this.fileInput.loadSettingsFrom(settings); | |
System.err.println("ReadFastaModel: loadValidatedSettingsFrom called input="+this.fileInput); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void validateSettings(final NodeSettingsRO settings) | |
throws InvalidSettingsException | |
{ | |
this.fileInput.validateSettings(settings); | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void loadInternals(final File internDir, | |
final ExecutionMonitor exec) throws IOException, | |
CanceledExecutionException { | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void saveInternals(final File internDir, | |
final ExecutionMonitor exec) throws IOException, | |
CanceledExecutionException { | |
} | |
} | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* @(#)$RCSfile$ | |
* $Revision$ $Date$ $Author$ | |
* | |
*/ | |
package org.lindenb.knime.readfasta; | |
import org.eclipse.core.runtime.Plugin; | |
import org.osgi.framework.BundleContext; | |
/** | |
* This is the eclipse bundle activator. | |
* Note: KNIME node developers probably won't have to do anything in here, | |
* as this class is only needed by the eclipse platform/plugin mechanism. | |
* If you want to move/rename this file, make sure to change the plugin.xml | |
* file in the project root directory accordingly. | |
* | |
* @author Pierre Lindenbaum | |
*/ | |
public class ReadFastaNodePlugin extends Plugin { | |
/** Make sure that this *always* matches the ID in plugin.xml. */ | |
public static final String PLUGIN_ID = "org.lindenb.knime.readfasta"; | |
// The shared instance. | |
private static ReadFastaNodePlugin plugin; | |
/** | |
* The constructor. | |
*/ | |
public ReadFastaNodePlugin() { | |
super(); | |
plugin = this; | |
} | |
/** | |
* This method is called upon plug-in activation. | |
* | |
* @param context The OSGI bundle context | |
* @throws Exception If this plugin could not be started | |
*/ | |
@Override | |
public void start(final BundleContext context) throws Exception { | |
super.start(context); | |
} | |
/** | |
* This method is called when the plug-in is stopped. | |
* | |
* @param context The OSGI bundle context | |
* @throws Exception If this plugin could not be stopped | |
*/ | |
@Override | |
public void stop(final BundleContext context) throws Exception { | |
super.stop(context); | |
plugin = null; | |
} | |
/** | |
* Returns the shared instance. | |
* | |
* @return Singleton instance of the Plugin | |
*/ | |
public static ReadFastaNodePlugin getDefault() { | |
return plugin; | |
} | |
} | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.lindenb.knime.readfasta; | |
import org.knime.core.node.NodeView; | |
/** | |
* <code>NodeView</code> for the "ReadFasta" Node. | |
* Read fasta files | |
* | |
* @author Pierre Lindenbaum | |
*/ | |
public class ReadFastaNodeView extends NodeView<ReadFastaNodeModel> { | |
/** | |
* Creates a new view. | |
* | |
* @param nodeModel The model (class: {@link ReadFastaNodeModel}) | |
*/ | |
protected ReadFastaNodeView(final ReadFastaNodeModel nodeModel) { | |
super(nodeModel); | |
// TODO instantiate the components of the view here. | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void modelChanged() { | |
// TODO retrieve the new model from your nodemodel and | |
// update the view. | |
ReadFastaNodeModel nodeModel = | |
(ReadFastaNodeModel)getNodeModel(); | |
assert nodeModel != null; | |
// be aware of a possibly not executed nodeModel! The data you retrieve | |
// from your nodemodel could be null, emtpy, or invalid in any kind. | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void onClose() { | |
// TODO things to do when closing the view | |
} | |
/** | |
* {@inheritDoc} | |
*/ | |
@Override | |
protected void onOpen() { | |
// TODO things to do when opening the view | |
} | |
} | |
Here is a screenshot: my Node reads a fasta file and transforms it into a two-columns table 'grep' all the sequences containing the word ONCOGENE, sort the sequences and output the result in a table (smaller window at the bottom).
That's it for tonight.
Pierre