18 July 2013

Running a picard tool in the #KNIME workflow engine

http://www.knime.org/ is "a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting". In this post, I'll show how to invoke an external java program, and more precisely a tool from the picard library from with knime. The workflow: load a list of BAM filenames, invoke SortSam and display the names of the sorted files.

Construct the following workflow:



Edit the FileReader node and load a list of paths to the BAMs


Edit the properties of the java node, in the "Additional Libraries" tab, load the jar of SortSam.jar



Edit the java snippet node, create a new column SORTED_BAM for the output.



and copy the following code:

// Your custom imports:
import net.sf.picard.sam.SortSam;
import java.io.File;
----------------------------------------------------------
// Enter your code here:


File input=new File(c_BAM);

//build the output filename 
out_SORTED = input.getName();
if(!(out_SORTED.endsWith(".sam") || out_SORTED.endsWith(".bam")))
{
 throw new Abort("not a SAM/BAM :"+c_BAM);
}
int dot=out_SORTED.lastIndexOf('.');
out_SORTED=new File(input.getParentFile(),out_SORTED.substring(0, dot)+"_sorted.bam").getPath();

//create a new instance of SortSam
SortSam cmd=new SortSam();

//invoke the instance
int ret=cmd.instanceMain(new String[]{
 "I="+input.getPath(),
 "O="+out_SORTED,
 "SO=coordinate",
 "VALIDATION_STRINGENCY=LENIENT",
 "CREATE_INDEX=true",
 "MAX_RECORDS_IN_RAM=500000"
 });

if(ret!=0)
{
 throw new Abort("SortSam failed with: "+c_BAM+" "+out_SORTED);
}
Execute KNIME, picard runs the job, and get the names of the sorted BAMs:



Edit:

The workflow was uplodaded on MyExperiment at http://www.myexperiment.org/workflows/3654.html.


That's it,

Pierre


No comments: