05 December 2015
03 December 2015
GATK-UI : a java-swing interface for the Genome Analysis Toolkit.
I've just pushed GATK-UI, a java swing interface for the Genome Analysis Toolkit GATK at https://github.com/lindenb/gatk-ui. This tool is also available as a WebStart/JNLP application.
Screenshot
Why did you create this tool ?
Some non-bioinformatician collaborators often want some coverage data for a defined set of BAM, for a specific region...
Did you test every tool ?
NO
How did you create an interface for each GATK tool ?
Each tool in the GATK is documented in a web page: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php and
each web page is associated to a structured JSON page https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php.json
{
"summary": "Select a subset of variants from a larger callset",
"parallel": [
{
"arg": "-nt",
"link": "http://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_engine_CommandLineGATK.php#-nt",
"name": "TreeReducible"
}
],
"activeregion": {},
This json is transformed to XML in order to process it with XSLT . A XSLT stylesheet generates some java code
That's it,
Pierre
Publié par Pierre Lindenbaum at 10:20 AM 0 commentaires
Libellés : bioinformatics, code, gatk, gui, java, json, xml, xslt
13 July 2015
Playing with #Docker , my notebook
This post is my notebook about docker after we had a very nice introduction about docker by François Moreews (INRIA/IRISA, Rennes). I've used docker today for the first time, my aim was just to create an image containing https://github.com/lindenb/verticalize, a small tool I wrote to verticalize text files.
Install docker
you hate running this kind of command-lines, aren't you ?$ wget -qO- https://get.docker.com/ | sh
sudo password for tatayoyo:
apparmor is enabled in the kernel and apparmor utils were already installed
/sbin/apparmor_parser
+ [ https://get.docker.com/ = https://get.docker.com/ ]
+ sudo -E sh -c apt-key adv --keyserver
(...)
add my linux account to the "docker" group
sudo usermod -aG docker tatayoyologout and relog...
I'm working behind a $@!# proxy: edit /etc/default/docker to set the proxy-things
$ cat /etc/default/docker
(...)
# If you need Docker to use an HTTP proxy, it can also be specified
here.
export http_proxy="http://(proxy-port):(proxy-host)/"
export https_proxy="http://(proxy-port):(proxy-host)/"
export ftp_proxy="http://(proxy-port):(proxy-host)/"
export HTTP_PROXY="http://(proxy-port):(proxy-host)/"
export FTP_PROXY="http://(proxy-port):(proxy-host)/"
export HTTPS_PROXY="http://(proxy-port):(proxy-host)/"
(...)
start the docker service
$ sudo start docker [sudo] password for tatayoyo: docker start/running, process 5023
create the Dockerfile
Create a new directory, in this directory we create a file named "Dockerfile". It contains- the name of the base-image we're using (here the latest ubuntu)
- the $@!# proxy settings (again ???!!!!)
- some calls to `apt` to download git, gcc, make ...
- some statements to clone https://github.com/lindenb/verticalize, compile and install my tool into /usr/local/bin
FROM ubuntu:latest
ENV http_proxy "http://(proxy-port):(proxy-host)/"
ENV https_proxy "http://(proxy-port):(proxy-host)/"
ENV ftp_proxy "http://(proxy-port):(proxy-host)/"
ENV HTTP_PROXY "http://(proxy-port):(proxy-host)/"
ENV HTTPS_PROXY "http://(proxy-port):(proxy-host)/"
ENV FTP_PROXY "http://(proxy-port):(proxy-host)/"
RUN apt-get update
RUN apt-get install -y wget gcc make git
RUN git clone "https://github.com/lindenb/verticalize.git" /tmp/verticalize && make -C /tmp/verticalize && cp /tmp/verticalize/verticalize /usr/local/bin && rm -rf /tmp/verticalize
create the image 'verticalize' from the Dockerfile
sudo docker build -t verticalize . (...)
List the images
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
verticalize latest 5f7159b4921a 12 seconds ago 317 MB
(...)
Tag the 'verticalize' image as hosted on my dockerhub repo https://registry.hub.docker.com/u/lindenb
$ docker tag 5f7159b4921a lindenb/verticalize:latest $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE verticalize latest 5f7159b4921a About a minute ago 317 MB lindenb/verticalize latest 5f7159b4921a About a minute ago 317 MB
Push the image to dockerhub
$ docker push lindenb/verticalize The push refers to a repository [lindenb/verticalize] (len: 1) 5f7159b4921a: Image push failed Please login prior to push: Username: lindenb Password: Email: xxxxxxx@yahoo.fr WARNING: login credentials saved in /home/tatyoyo/.docker/config.json Login Succeeded The push refers to a repository [lindenb/verticalize] (len: 1) 5f7159b4921a: Image already exists 68f6ddc7de15: Buffering to Disk
We can now remove the local image ...
$ docker rmi -f 5f7159b4921a
.. and pull the image from dockerhub
$ docker pull lindenb/verticalize
latest: Pulling from lindenb/verticalize
83e4dde6b9cf: Downloading [==================> ] 24.82 MB/65.79 MB
b670fb0c7ecd: Download complete
29460ac93442: Download complete
d2a0ecffe6fa: Download complete
48e98a1c03ae: Download complete
94ac1beb0514: Download complete
e12eda8693a9: Download complete
5eb1952afbb7: Download complete
fb4ac6e6a264: Download complete
0f8372bacf03: Download complete
789c4f122778: Downloading [=================> ] 7.511 MB/20.92 MB
68f6ddc7de15: Downloading [=====> ] 4.99 MB/44.61 MB
5f7159b4921a: Download complete
At the end, run a command inside the docker container
My tool verticalize is installed in the image 'lindenb/verticalize:latest' :$ cat << EOF | docker run -i lindenb/verticalize:latest > echo -e "#X\tY\n1\t2\n3\t4" | verticalize > EOF >>> 2 $1 #X 1 $2 Y 2 <<< 2 >>> 3 $1 #X 3 $2 Y 4 <<< 3
That's it,
Pierre
Publié par Pierre Lindenbaum at 3:44 PM 0 commentaires
29 June 2015
A BLAST to SAM converter.
Some times ago, I've received a set of Ion-Torrent /mate-reads with a poor quality. I wasn't able to align much things using bwa. I've always wondered if I could get better alignments using NCBI-BLASTN (short answer: no) . That's why I asked guyduche, my intership student to write a C program to convert the output of blastn to SAM. His code is available on github at :
The input for blast2sam is- the XML output of NCBI blastn (or stdin)
- The single or pair of fastq file(s)
- The reference sequence indexed with picard
Example:
fastq2fasta in.R1.fq.gz in.R2.fq.gz |\ blastn -db REFERENCE -outfmt 5 | \ blast2bam -o result.bam -W 40 -R '@RG ID:foo SM:sample' - REFERENCE.dict in.R1.fq.gz in.R2.fq.gz
Output:
@SQ SN:gi|9629357|ref|NC_001802.1| LN:9181 @RG ID:foo SM:sample @PG ID:Blast2Bam PN:Blast2Bam VN:0.1 CL:../../bin/blast2bam -o results.sam -W 40 -R @RG ID:foo SM:sample - db.dict test_1.fastq.gz test_2.fastq.gz (...) ERR656485.2 83 gi|9629357|ref|NC_001802.1| 715 60 180S7=1X8=1X11=1X2=2X4=1X14=1X8=1X33=1X4=1X2=1X5=1X2=1X6=1S = 715 -119 CCTAGTGTTGCTTGCTTTTCTTCTTTTTTTTTTCAAGCAGAAGACGGCATACGAGATCCTCTATCGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTAAGATAGAGGAAGAACAAAACAAATGTCAGCAAAGTCAGCAAAAGACACAGCAGGAAAAAGGGGCTGACGGGAAGGTCAGTCAAAATTATCCTATAGTGCAAAATCTCCAAGGGCAAATGGTACACCAGGCCATGTCACCTAGAACTTTAAATGCATGGGTAAAAGTAATAGAGGAAAAGGCCTTTAGCCCAN (),.((((,(((((,((.((.-(>69>20E>6/=>5EC@9-52?BEE::2951.)74B64=B==FFAF=A??59:>FFFDF:55GGFGF?DFGGFE868>GGGFGGGGED;FGFFGGGGGGGGGGGEFFGE9GGGGFGGGGGGGGDGECGGFGGGGGGGGGGFGGGGGEGGFGGGGGGFFGGGGGFF?EGGFFFEGGGGGGGGFEGGGEGGGFEGGGGGGGGGGDGFFCEGFGGGGGGGGGGGFFECFGGGGFGGGGGGGGGGGFCGGGGGGGGGGGGGGGGGGFGGGGGGGGF@CCA8! NM:i:13 RG:Z:foo AS:i:80 XB:f:148.852 XE:Z:4.07e-39 ERR656485.2 163 gi|9629357|ref|NC_001802.1| 715 60 73S7=1X8=1X11=1X2=2X4=1X14=1X8=1X33=1X4=1X2=1X5=1X2=1X8=106S = 715 119 NAGATAGAGGAAGAACAAAACAAATGTCAGCAAAGTCAGCAAAAGACACAGCAGGAAAAAGGGGCTGACGGGAAGGTCAGTCAAAATTATCCTATAGTGCAAAATCTCCAAGGGCAAATGGTACACCAGGCCATGTCACCTAGAACTTTAAATGCATGGGTAAAAGTAATAGAGGAAAAGGCCTTTAGCCCAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTCTCATTACAAAAAAAACATACACAATAAATGATATAAGCGGAATCAACAGCATGA !8A@CGGEFGFGCDFGGGGGGGGGGGGGGFGGGGGFGFGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGFGGFGGGGGGGGGEGFGFGGGFFGGGGGGGGFGGGGGGGGGGGGFFFFGGGGGG=FFGGFFDGGGGGGGG8FGFGGGGGGGGGFGGGGGGGGGGFDGGFGGFGGGFFFGFF8DFDFDFFFFFFFFFBCDB<@EAFB@ABAC@CDFF?4>EEFE<*>BDAFB@FFBFF>((6<5CC.;C;=D9106(.))).)-46<<))))))))))((,(-)))()((())) NM:i:13 RG:Z:foo AS:i:82 XB:f:152.546 XE:Z:3.15e-40 (...)Now, I would be interested in finding another dataset where this tool could be successfully used.
That's it,
Pierre
Publié par Pierre Lindenbaum at 11:30 AM 8 commentaires
18 June 2015
Playing with the #GA4GH schemas and #Avro : my notebook
curl -L -o avro-tools-1.7.7.jar "http://www.eng.lsu.edu/mirrors/apache/avro/avro-1.7.7/java/avro-tools-1.7.7.jar"
curl -L -o schema.zip "https://github.com/ga4gh/schemas/archive/v0.5.1.zip" unzip schema.zip rm schema.zip $ find -name "*.avdl" ./schemas-0.5.1/src/main/resources/avro/readmethods.avdl ./schemas-0.5.1/src/main/resources/avro/common.avdl ./schemas-0.5.1/src/main/resources/avro/wip/metadata.avdl ./schemas-0.5.1/src/main/resources/avro/wip/metadatamethods.avdl ./schemas-0.5.1/src/main/resources/avro/wip/variationReference.avdl ./schemas-0.5.1/src/main/resources/avro/variants.avdl ./schemas-0.5.1/src/main/resources/avro/variantmethods.avdl ./schemas-0.5.1/src/main/resources/avro/beacon.avdl ./schemas-0.5.1/src/main/resources/avro/references.avdl ./schemas-0.5.1/src/main/resources/avro/referencemethods.avdl ./schemas-0.5.1/src/main/resources/avro/reads.avdl
$ java -jar avro-tools-1.7.7.jar compile protocol schemas-0.5.1/src/main/resources/avro/ ./generated Input files to compile: schemas-0.5.1/src/main/resources/avro/variants.avpr $ find generated/org/ -name "*.java" generated/org/ga4gh/GAPosition.java generated/org/ga4gh/GAVariantSetMetadata.java generated/org/ga4gh/GACall.java generated/org/ga4gh/GAException.java generated/org/ga4gh/GACigarOperation.java generated/org/ga4gh/GAVariantSet.java generated/org/ga4gh/GAVariants.java generated/org/ga4gh/GAVariant.java generated/org/ga4gh/GACallSet.java generated/org/ga4gh/GACigarUnit.java
Compile, archive and execute:
#compile classes javac -d generated -cp avro-tools-1.7.7.jar -sourcepath generated:src generated/org/ga4gh/*.java src/test/TestAvro.java # archive jar cvf generated/ga4gh.jar -C generated org -C generated test # run java -cp avro-tools-1.7.7.jar:generated/ga4gh.jar test.TestAvro > variant.avroWe use the avro-tools to convert the generated file variant.avro to json
java -jar avro-tools-1.7.7.jar tojson variant.avro
Output:
The complete Makefile
That's it,
Pierre
07 May 2015
Monitoring a java application with mbeans. An example with samtools/htsjdk.
"A MBean is a Java object that follows the JMX specification. A MBean can represent a device, an application, or any resource that needs to be managed. The JConsole graphical user interface is a monitoring tool that complies to the JMX specification.". In this post I'll show how I've modified the sources of the htsjdk library to monitor the java program reading a VCF file from the Exac server. See my commit at https://github.com/lindenb/htsjdk/commit/3c1ac1a18917aaa69f8dc49c70fd893a6a0542c3.
First, we define a java class ProgressLoggerMBean to tell java about the informations that will be forwarded to the jconsole: the number of records processed, the elapsed time, etc...
package htsjdk.samtools.util; public interface ProgressLoggerMBean { /* the noun to use when logging, e.g. "Records, Variants, Loci" */ public String getNoun(); /* verb the verb to log, e.g. "Processed, Read, Written" */ public String getVerb(); /** Returns the count of records processed. */ public long getCount(); /** elapsed time */ public String getElapsedTime(); /** last record */ public String getLastRecord(); }The already existing htsjdk class htsjdk.samtools.util.ProgressLogger is modified: it now implements ProgressLoggerMBean:
public class ProgressLogger implements ProgressLoggerInterface, Closeable, ProgressLoggerMBeanThe methods are implemented:
(...) @Override public String getElapsedTime(){ return this.formatElapseTime(this.getElapsedSeconds()); } @Override public String getLastRecord(){ return this.lastRecord; }In the constructor we try to connect to the MBean server that has been created and initialized by the platform. The ProgressLogger is wrapped into an ObjectName and inserted in the MBean server:
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer(); /* defines an object name for the MBean instance that it will create */ this.objectMBean = new ObjectName("htsjdk.samtools.util:type="+noun); mbs.registerMBean(this, this.objectMBean);
A 'close' method is used to unregister the object from the MBean server:
@Override public void close() { if(this.objectMBean!=null) { try { MBeanServer mbs = ManagementFactory.getPlatformMBeanServer(); mbs.unregisterMBean(this.objectMBean); } catch(Exception err) { //ignore } finally { this.objectMBean=null; } }Here is an example. This program uses the htsjdk library to parse a VCF file:
import htsjdk.variant.vcf.*; import htsjdk.variant.variantcontext.*; import htsjdk.tribble.readers.*; import htsjdk.samtools.util.*; public class TestProgress { private final static Log log = Log.getInstance(TestProgress.class); public static void main(String args[]) throws Exception { ProgressLoggerInterface progress = new ProgressLogger(log, 1000, "Read VCF"); VCFCodec codec= new VCFCodec(); LineReader r= LineReaderUtil.fromBufferedStream(System.in); LineIteratorImpl t= new LineIteratorImpl(r); codec.readActualHeader(t); while(t.hasNext()) { VariantContext ctx = codec.decode(t.next()); progress.record(ctx.getContig(),ctx.getStart()); } r.close(); progress.close(); } }Compile and execute to download Exac:
javac -cp dist/htsjdk-1.130.jar:dist/snappy-java-1.0.3-rc3.jar:dist/commons-jexl-2.1.1.jar:dist/commons-logging-1.1.1.jar TestProgress.java && \ curl -s "ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3/ExAC.r0.3.sites.vep.vcf.gz" |\ gunzip -c |\ java -cp dist/htsjdk-1.130.jar:dist/snappy-java-1.0.3-rc3.jar:dist/commons-jexl-2.1.1.jar:dist/commons-logging-1.1.1.jar:. TestProgress (...) INFO 2015-05-07 21:07:02 TestProgress Read VCF 675,000 records. Elapsed time: 00:03:33s. Time for last 1,000: 0s. Last read position: 1:168,035,033 INFO 2015-05-07 21:07:03 TestProgress Read VCF 676,000 records. Elapsed time: 00:03:33s. Time for last 1,000: 0s. Last read position: 1:168,216,140 INFO 2015-05-07 21:07:03 TestProgress Read VCF 677,000 records. Elapsed time: 00:03:34s. Time for last 1,000: 0s. Last read position: 1:169,076,058 INFO 2015-05-07 21:07:03 TestProgress Read VCF 678,000 records. Elapsed time: 00:03:34s. Time for last 1,000: 0s. Last read position: 1:169,366,434 INFO 2015-05-07 21:07:03 TestProgress Read VCF 679,000 records. Elapsed time: 00:03:34s. Time for last 1,000: 0s. Last read position: 1:169,500,081 (...)
The progression can now be monitored in the jconsole:
That's it.
Pierre
Publié par Pierre Lindenbaum at 9:47 PM 0 commentaires
05 May 2015
Playing with hadoop/mapreduce and htsjdk/VCF : my notebook.
The aim of this test is to get a count of each type of variant/genotypes in a VCF file using Apache Hadoop and the java library for NGS htsjdk. My source code is available at: https://github.com/lindenb/hadoop-sandbox/blob/master/src/main/java/com/github/lindenb/hadoop/Test.java.
First, and this is my main problem, I needed to create a class 'VcfRow' that would contains the whole data about a variant. As I need to keep the information about all the semantics in the VCF header, each record contains the whole VCF header (!). I asked SO if there was an elegant way to save the header in the hadoop workflow but it currently seems that there is no such solution (http://stackoverflow.com/questions/30052859/hadoop-mapreduce-handling-a-text-file-with-a-header). This class
VcfRow must implement WritableComparable to be serialized by the hadoop pipeline. It's awfully sloooooow since we need to parse a htsjdk.variant.vcf.VCFHeader and a htsjdk.variant.vcf.VCFCodec for each new variant.
public static class VcfRow implements WritableComparable<VcfRow> { private List<String> headerLines; private String line; private VariantContext ctx=null; private VCFHeader header =null; private VCFCodec codec=new VCFCodec(); public VcfRow() { this.headerLines = Collections.emptyList(); this.line=""; } public VcfRow(List<String> headerLines,String line) { this.headerLines=headerLines; this.line=line; } @Override public void write(DataOutput out) throws IOException { out.writeInt(this.headerLines.size()); for(int i=0;i< this.headerLines.size();++i) { out.writeUTF(this.headerLines.get(i)); } byte array[]=line.getBytes(); out.writeInt(array.length); out.write(array); } @Override public void readFields(DataInput in) throws IOException { int n= in.readInt(); this.headerLines=new ArrayList<String>(n); for(int i=0;i<n;++i) this.headerLines.add(in.readUTF()); n = in.readInt(); byte array[]=new byte[n]; in.readFully(array); this.line=new String(array); this.codec=new VCFCodec(); this.ctx=null; this.header=null; } public VCFHeader getHeader() { if(this.header==null) { this.header = (VCFHeader)this.codec.readActualHeader(new MyLineIterator()); } return this.header; } public VariantContext getVariantContext() { if(this.ctx==null) { if(this.header==null) getHeader();//force decode header this.ctx=this.codec.decode(this.line); } return this.ctx; } @Override public int compareTo(VcfRow o) { int i = this.getVariantContext().getContig().compareTo(o.getVariantContext().getContig()); if(i!=0) return i; i = this.getVariantContext().getStart() - o.getVariantContext().getStart(); if(i!=0) return i; i = this.getVariantContext().getReference().compareTo( o.getVariantContext().getReference()); if(i!=0) return i; return this.line.compareTo(o.line); } private class MyLineIterator extends AbstractIterator<String> implements LineIterator { int index=0; @Override protected String advance() { if(index>= headerLines.size()) return null; return headerLines.get(index++); } } }
Then a special InputFormat is created for the VCF format. As we need to keep a trace of the Header, this file declares
`isSplitable==false`
. The class VcfInputFormat creates an instance of RecordReader reading the whole VCF header the first time it is invoked with the method `initialize`
. This 'VcfRecordReader' creates a new VcfRow for each line.public static class VcfInputFormat extends FileInputFormat<LongWritable, VcfRow> { private List<String> headerLines=new ArrayList<String>(); @Override public RecordReader<LongWritable, VcfRow> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException { return new VcfRecordReader(); } @Override protected boolean isSplitable(JobContext context, Path filename) { return false; } //LineRecordReader private class VcfRecordReader extends RecordReader<LongWritable, VcfRow> { private LineRecordReader delegate=new LineRecordReader(); public VcfRecordReader() throws IOException { } @Override public void initialize(InputSplit genericSplit, TaskAttemptContext context) throws IOException { delegate.initialize(genericSplit, context); while( delegate.nextKeyValue()) { String row = delegate.getCurrentValue().toString(); if(!row.startsWith("#")) throw new IOException("Bad VCF header"); headerLines.add(row); if(row.startsWith("#CHROM")) break; } } @Override public LongWritable getCurrentKey() throws IOException, InterruptedException { return delegate.getCurrentKey(); } @Override public VcfRow getCurrentValue() throws IOException, InterruptedException { Text row = this.delegate.getCurrentValue(); return new VcfRow(headerLines,row.toString()); } @Override public float getProgress() throws IOException, InterruptedException { return this.delegate.getProgress(); } @Override public boolean nextKeyValue() throws IOException, InterruptedException { return this.delegate.nextKeyValue(); } @Override public void close() throws IOException { delegate.close(); } } }
The hadoop mapper uses the information of each VCFrow and produce a count of each category:
public static class VariantMapper extends Mapper<LongWritable, VcfRow, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, VcfRow vcfRow, Context context ) throws IOException, InterruptedException { VariantContext ctx = vcfRow.getVariantContext(); if( ctx.isIndel()) { word.set("ctx_indel"); context.write(word, one); } if( ctx.isBiallelic()) { word.set("ctx_biallelic"); context.write(word, one); } if( ctx.isSNP()) { word.set("ctx_snp"); context.write(word, one); } if( ctx.hasID()) { word.set("ctx_id"); context.write(word, one); } word.set("ctx_total"); context.write(word, one); for(String sample: vcfRow.getHeader().getSampleNamesInOrder()) { Genotype g =vcfRow.getVariantContext().getGenotype(sample); word.set(sample+" "+ctx.getType()+" "+g.getType().name()); context.write(word, one); } } }
The Reducer computes the sum of each category:
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
and here is the main program:
public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "snp count"); job.setJarByClass(Test.class); job.setMapperClass(VariantMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Path inputPath=new Path(args[0]); job.setInputFormatClass(VcfInputFormat.class); FileInputFormat.addInputPath(job, inputPath); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }
Download, compile, Run:
lindenb@hardyweinberg:~/src/hadoop-sandbox$ make -Bn rm -rf hadoop-2.7.0 curl -L -o hadoop-2.7.0.tar.gz "http://apache.spinellicreations.com/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz" tar xvfz hadoop-2.7.0.tar.gz rm hadoop-2.7.0.tar.gz touch -c hadoop-2.7.0/bin/hadoop rm -rf htsjdk-1.130 curl -L -o 1.130.tar.gz "https://github.com/samtools/htsjdk/archive/1.130.tar.gz" tar xvfz 1.130.tar.gz rm 1.130.tar.gz (cd htsjdk-1.130 && ant ) mkdir -p tmp dist javac -d tmp -cp hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0.jar:hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.0.jar:hadoop-2.7.0/share/hadoop/common/lib/hadoop-annotations-2.7.0.jar:hadoop-2.7.0/share/hadoop/common/lib/log4j-1.2.17.jar:htsjdk-1.130/dist/commons-logging-1.1.1.jar:htsjdk-1.130/dist/htsjdk-1.130.jar:htsjdk-1.130/dist/commons-jexl-2.1.1.jar:htsjdk-1.130/dist/snappy-java-1.0.3-rc3.jar -sourcepath src/main/java src/main/java/com/github/lindenb/hadoop/Test.java jar cvf dist/test01.jar -C tmp . rm -rf tmp mkdir -p input curl -o input/CEU.exon.2010_09.genotypes.vcf.gz "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/exon/snps/CEU.exon.2010_09.genotypes.vcf.gz" gunzip -f input/CEU.exon.2010_09.genotypes.vcf.gz rm -rf output HADOOP_CLASSPATH=htsjdk-1.130/dist/commons-logging-1.1.1.jar:htsjdk-1.130/dist/htsjdk-1.130.jar:htsjdk-1.130/dist/commons-jexl-2.1.1.jar:htsjdk-1.130/dist/snappy-java-1.0.3-rc3.jar hadoop-2.7.0/bin/hadoop jar dist/test01.jar com.github.lindenb.hadoop.Test \ input/CEU.exon.2010_09.genotypes.vcf output cat output/*
Here is the output of the last command:
15/05/05 17:18:34 INFO input.FileInputFormat: Total input paths to process : 1 15/05/05 17:18:34 INFO mapreduce.JobSubmitter: number of splits:1 15/05/05 17:18:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1186897577_0001 15/05/05 17:18:34 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 15/05/05 17:18:34 INFO mapreduce.Job: Running job: job_local1186897577_0001 15/05/05 17:18:34 INFO mapred.LocalJobRunner: OutputCommitter set in config null 15/05/05 17:18:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 15/05/05 17:18:34 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 15/05/05 17:18:34 INFO mapred.LocalJobRunner: Waiting for map tasks 15/05/05 17:18:34 INFO mapred.LocalJobRunner: Starting task: attempt_local1186897577_0001_m_000000_0 15/05/05 17:18:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 15/05/05 17:18:34 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 15/05/05 17:18:34 INFO mapred.MapTask: Processing split: file:/home/lindenb/src/hadoop-sandbox/input/CEU.exon.2010_09.genotypes.vcf:0+2530564 15/05/05 17:18:34 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 15/05/05 17:18:34 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 15/05/05 17:18:34 INFO mapred.MapTask: soft limit at 83886080 15/05/05 17:18:34 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 15/05/05 17:18:34 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 15/05/05 17:18:34 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 15/05/05 17:18:35 INFO mapreduce.Job: Job job_local1186897577_0001 running in uber mode : false 15/05/05 17:18:35 INFO mapreduce.Job: map 0% reduce 0% 15/05/05 17:18:36 INFO mapred.LocalJobRunner: 15/05/05 17:18:36 INFO mapred.MapTask: Starting flush of map output 15/05/05 17:18:36 INFO mapred.MapTask: Spilling map output 15/05/05 17:18:36 INFO mapred.MapTask: bufstart = 0; bufend = 7563699; bufvoid = 104857600 15/05/05 17:18:36 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 24902536(99610144); length = 1311861/6553600 15/05/05 17:18:38 INFO mapred.MapTask: Finished spill 0 15/05/05 17:18:38 INFO mapred.Task: Task:attempt_local1186897577_0001_m_000000_0 is done. And is in the process of committing (...) NA12843 SNP HOM_REF 2515 NA12843 SNP HOM_VAR 242 NA12843 SNP NO_CALL 293 NA12872 SNP HET 394 NA12872 SNP HOM_REF 2282 NA12872 SNP HOM_VAR 188 NA12872 SNP NO_CALL 625 NA12873 SNP HET 336 NA12873 SNP HOM_REF 2253 NA12873 SNP HOM_VAR 184 NA12873 SNP NO_CALL 716 NA12874 SNP HET 357 NA12874 SNP HOM_REF 2395 NA12874 SNP HOM_VAR 229 NA12874 SNP NO_CALL 508 NA12878 SNP HET 557 NA12878 SNP HOM_REF 2631 NA12878 SNP HOM_VAR 285 NA12878 SNP NO_CALL 16 NA12889 SNP HET 287 NA12889 SNP HOM_REF 2110 NA12889 SNP HOM_VAR 112 NA12889 SNP NO_CALL 980 NA12890 SNP HET 596 NA12890 SNP HOM_REF 2587 NA12890 SNP HOM_VAR 251 NA12890 SNP NO_CALL 55 NA12891 SNP HET 609 NA12891 SNP HOM_REF 2591 NA12891 SNP HOM_VAR 251 NA12891 SNP NO_CALL 38 NA12892 SNP HET 585 NA12892 SNP HOM_REF 2609 NA12892 SNP HOM_VAR 236 NA12892 SNP NO_CALL 59 ctx_biallelic 3489 ctx_id 3489 ctx_snp 3489 ctx_total 3489
that's it,
Pierre
Publié par Pierre Lindenbaum at 5:24 PM 1 commentaires
Libellés : bioinformatics, hadoop, mapreduce, ngs, vcf
28 February 2015
Integrating a java program in #usegalaxy.
This is my notebook for the integration of java programs in https://usegalaxy.org/ .
create a directory for your tools under ${galaxy-root}/tools
mkdir ${galaxy-root}/tools/jvarkit
put all the required jar files and the XML files describing your tools (see below) in this new directory:
$ ls ${galaxy-root}/tools/jvarkit/ commons-jexl-2.1.1.jar groupbygene.jar htsjdk-1.128.jar vcffilterjs.jar vcffilterso.jar vcfhead.jar vcftail.jar vcftrio.jar commons-logging-1.1.1.jar groupbygene.xml snappy-java-1.0.3-rc3.jar vcffilterjs.xml vcffilterso.xml vcfhead.xml vcftail.xml vcftrio.xml
Each tool is described with a XML file whose schema is available at: https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax . For example here is a simple file describing the tool VcfHead which prints the very first variants of a VCF file:
<?xml version="1.0"?>
<tool id="com.github.lindenb.jvarkit.tools.misc.VcfHead" version="1.0.0" name="vcfhead">
<description>Print first variants of a VCF</description>
<requirements>
<requirement type="binary">java</requirement>
</requirements>
<command>(gunzip -c ${input} || cat ${input}) | java -cp $__tool_directory__/commons-jexl-2.1.1.jar:$__tool_directory__/commons-logging-1.1.1.jar:$__tool_directory__/htsjdk-1.128.jar:$__tool_directory__/snappy-java-1.0.3-rc3.jar:$__tool_directory__/vcfhead.jar com.github.lindenb.jvarkit.tools.misc.VcfHead -n '${num}' -o ${output}.vcf.gz && mv ${output}.vcf.gz ${output}</command>
<inputs>
<param format="vcf" name="input" type="data" label="VCF input"/>
<param name="num" type="integer" label="Number of variants" min="0" value="10"/>
</inputs>
<outputs>
<data format="vcf" name="output"/>
</outputs>
<stdio>
<exit_code range="1:"/>
<exit_code range=":-1"/>
</stdio>
<help/>
</tool>
The input file is described by
<param format="vcf" name="input" type="data" label="VCF input"/>
The number of lines is declared in:
<param name="num" type="integer" label="Number of variants" min="0" value="10"/>
Those two variables will be replaced in the command line at runtime by galaxy.
The command line is
(gunzip -c ${input} || cat ${input}) | \ java -cp $__tool_directory__/commons-jexl-2.1.1.jar:$__tool_directory__/commons-logging-1.1.1.jar:$__tool_directory__/htsjdk-1.128.jar:$__tool_directory__/snappy-java-1.0.3-rc3.jar:$__tool_directory__/vcfhead.jar \ com.github.lindenb.jvarkit.tools.misc.VcfHead \ -n '${num}' -o ${output}.vcf.gz && \ mv ${output}.vcf.gz ${output}
it starts with (gunzip -c ${input} || cat ${input})
because we don't know if the input will be gzipped.
The main problem, here, is to set the CLASSPATH and tell java where to find the jar libraries. With the help of @pjacock and @jmchilton I learned that the recent release of galaxy defines a variable $__tool_directory__
defining the location of your repository, so you'll just have to append the jar file to this variable:
$__tool_directory__/commons-jexl-2.1.1.jar:$__tool_directory__/commons-logging-1.1.1.jar:....
You'll need to declare the new tools in ${galaxy-root}/config/tool_conf.xml
(...)
</section>
<section id="jvk" name="JVARKIT">
<tool file="jvarkit/vcffilterjs.xml"/>
<tool file="jvarkit/vcfhead.xml"/>
<tool file="jvarkit/vcftail.xml"/>
<tool file="jvarkit/vcffilterso.xml"/>
<tool file="jvarkit/vcftrio.xml"/>
<tool file="jvarkit/vcfgroupbygene.xml"/>
</section>
</toolbox>
Your tools are now available in the 'tools' menu
Clicking on a link will make galaxy displaying a form for your tool:
As far as I can see for now, making a tar archive of your tool directory and uploading it in the galaxy toolshed ( https://toolshed.g2.bx.psu.edu/repository ), will make your tools available to the scientific community.
That's it,
Pierre
Publié par Pierre Lindenbaum at 5:01 PM 0 commentaires
22 February 2015
Drawing a Manhattan plot in SVG using a GWAS+XML model.
On friday, I saw my colleague @b_l_k starting writing SVG+XML code to draw a Manhattan plot. I told him that a better idea would be to describe the data using XML and to transform the XML to SVG using XSLT.
So, let's do this. I put the XSLT stylesheet on github at https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/manhattan.xsl . And the model of data would look like this (I took the data from @genetics_blog's http://www.gettinggeneticsdone.com/2011/04/annotated-manhattan-plots-and-qq-plots.html ) .
<?xml version="1.0"?>
<manhattan>
<chromosome name="1" length="1500">
<data rs="1" position="1" p="0.914806043496355"/>
<data rs="2" position="2" p="0.937075413297862"/>
<data rs="3" position="3" p="0.286139534786344"/>
<data rs="4" position="4" p="0.830447626067325"/>
<data rs="5" position="5" p="0.641745518893003"/>
(...)
</chromosome>
<chromosome name="22" length="535">
<data rs="15936" position="1" p="0.367785102687776"/>
<data rs="15937" position="2" p="0.628192085539922"/>
(...)
<data rs="1" position="1" p="0.914806043496355"/>
</chromosome>
</manhattan>
The stylesheet
At the beginning , we declare the size of the drawing
<xsl:variable name="width" select="number(1000)"/>
<xsl:variable name="height" select="number(400)"/>
we need to get the size of the genome.
<xsl:variable name="genomeSize">
<xsl:call-template name="sumLengthChrom">
<xsl:with-param name="length" select="number(0)"/>
<xsl:with-param name="node" select="manhattan/chromosome[1]"/>
</xsl:call-template>
</xsl:variable>
We could use the xpath function 'sum()' but here I the user is free to omit the size of the chromosome. It this attribute '@length' is not declared, we use the maximum position of the SNP in this chromosome:
<xsl:template name="sumLengthChrom">
<xsl:param name="length"/>
<xsl:param name="node"/>
<xsl:variable name="chromlen">
<xsl:apply-templates select="$node" mode="length"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="count($node/following-sibling::chromosome)>0">
<xsl:call-template name="sumLengthChrom">
<xsl:with-param name="length" select="$length + number($chromlen)"/>
<xsl:with-param name="node" select="$node/following-sibling::chromosome[1]"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$length + number($chromlen)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
we get the smallest p-value:
<xsl:variable name="minpvalue">
<xsl:for-each select="manhattan/chromosome/data">
<xsl:sort select="@p" data-type="number" order="ascending"/>
<xsl:if test="position() = 1">
<xsl:value-of select="number(@p)"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
then we plot each chromosome, the xsl parameter "previous" contains the number of bases already printed.
We use the SVG attribute transform to translate the current chromosome in the drawing
<xsl:template name="plotChromosomes">
<xsl:param name="previous"/>
<xsl:param name="node"/>
(...)
<xsl:attribute name="transform">
<xsl:text>translate(</xsl:text>
<xsl:value-of select="(number($previous) div number($genomeSize)) * $width"/>
<xsl:text>,0)</xsl:text>
</xsl:attribute>
we plot each SNP:
<svg:g style="fill-opacity:0.5;">
<xsl:apply-templates select="data" mode="plot"/>
</svg:g>
and we plot the remaining chromosomes, if any :
<xsl:if test="count($node/following-sibling::chromosome)>0">
<xsl:call-template name="plotChromosomes">
<xsl:with-param name="previous" select="$previous + number($chromlen)"/>
<xsl:with-param name="node" select="$node/following-sibling::chromosome[1]"/>
</xsl:call-template>
</xsl:if>
to plot each SNP, we get the X coordinate in the current chromosome:
<xsl:template match="data" mode="x">
<xsl:variable name="chromWidth">
<xsl:apply-templates select=".." mode="width"/>
</xsl:variable>
<xsl:variable name="chromLength">
<xsl:apply-templates select=".." mode="length"/>
</xsl:variable>
<xsl:value-of select="(number(@position) div number($chromLength)) * $chromWidth"/>
</xsl:template>
and the Y coordinate:
<xsl:template match="data" mode="y">
<xsl:value-of select="$height - (( (math:log(number(@p)) * -1 ) div $maxlog2value ) * $height )"/>
</xsl:template>
we can also wrap the data in a hyperlink if a @rs attribute exists:
<xsl:choose>
<xsl:when test="@rs">
<svg:a target="_blank">
<xsl:attribute name="xlink:href">
<xsl:value-of select="concat('http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=',@rs)"/>
</xsl:attribute>
<xsl:apply-templates select="." mode="plotshape"/>
</svg:a>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="." mode="plotshape"/>
</xsl:otherwise>
</xsl:choose>
we plot the SNP itself, as a circle:
<xsl:template match="data" mode="plotshape">
<svg:circle r="5">
<xsl:attribute name="cx">
<xsl:apply-templates select="." mode="x"/>
</xsl:attribute>
<xsl:attribute name="cy">
<xsl:apply-templates select="." mode="y"/>
</xsl:attribute>
</svg:circle>
</xsl:template>
Result:
$ xsltproc manhattan.xsl model.xml > plot.svg
That's it
Pierre
Publié par Pierre Lindenbaum at 5:54 PM 0 commentaires
Libellés : bioinformatics, genetics, visualization, xml, xslt
18 February 2015
Automatic code generation for @knime with XSLT: An example with two nodes: fasta reader and writer.
KNIME is a java+eclipse-based graphical workflow-manager.
Biologists in my lab often use this tool to filter VCFs or other tabular data. A software Development kit (SDK) is provided to build new nodes. My main problem with this SDK is, that you need to write a large number of similar files and you also have to interact with a graphical interface. I wanted to automatize the generation of java code for new node. In the following post I will describe how I wrote two new node for reading and writing fasta files.
The nodes are described in a XML file and the java code is generated with a XSLT stylesheet and is available on github at:
Example
We're going to create two nodes for FASTA:
- a fasta reader
- a fasta writer
We define a plugin.xml file, it uses xinclude to include the definition of the two nodes. The base package of will be com.github.lindenb.xsltsandbox . The nodes will be displayed in the knime-workbench under /community/bio/fasta
<?xml version="1.0" encoding="UTF-8"?> <plugin xmlns:xi="http://www.w3.org/2001/XInclude" package="com.github.lindenb.xsltsandbox" > <category name="bio"> <category name="fasta" label="Fasta"> <xi:include href="node.read-fasta.xml"/> <xi:include href="node.write-fasta.xml"/> </category> </category> </plugin>
node.read-fasta.xml : it takes a FileReader (for the input fasta file ) and an integer to limit the number of fasta sequences to be read. The outpout will be a table with two columns (name/sequence). We only write the code for reading the fasta file.
<?xml version="1.0" encoding="UTF-8"?>
<node name="readfasta" label="Read Fasta" description="Reads a Fasta file">
<property type="file-read" name="fastaIn">
<extension>.fa</extension>
<extension>.fasta</extension>
<extension>.fasta.gz</extension>
<extension>.fa.gz</extension>
</property>
<property type="int" name="limit" label="max sequences" description="number of sequences to be fetch. 0 = ALL" default="0">
</property>
<property type="bool" name="upper" label="Uppercase" description="Convert to Uppercase" default="false">
</property>
<outPort name="output">
<column name="title" label="Title" type="string"/>
<column name="sequence" label="Sequence" type="string"/>
</outPort>
<code>
<import>
import java.io.*;
</import>
<body>
@Override
protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
{
int limit = this.getPropertyLimitValue();
String url = this.getPropertyFastaInValue();
boolean to_upper = this.getPropertyUpperValue();
getLogger().info("reading "+url);
java.io.BufferedReader r= null;
int n_sequences = 0;
try
{
r = this.openUriForBufferedReader(url);
DataTableSpec dataspec0 = this.createOutTableSpec0();
BufferedDataContainer container0 = exec.createDataContainer(dataspec0);
String seqname="";
StringBuilder sequence=new StringBuilder();
for(;;)
{
exec.checkCanceled();
exec.setMessage("Sequences "+n_sequences);
String line= r.readLine();
if(line==null || line.startsWith(">"))
{
if(!(sequence.length()==0 && seqname.trim().isEmpty()))
{
container0.addRowToTable(new org.knime.core.data.def.DefaultRow(
org.knime.core.data.RowKey.createRowKey(n_sequences),
this.createDataCellsForOutTableSpec0(seqname,sequence)
));
++n_sequences;
}
if(line==null) break;
if( limit!=0 && limit==n_sequences) break;
seqname=line.substring(1);
sequence=new StringBuilder();
}
else
{
line= line.trim();
if( to_upper ) line= line.toUpperCase();
sequence.append(line);
}
}
container0.close();
BufferedDataTable out0 = container0.getTable();
return new BufferedDataTable[]{out0};
}
finally
{
r.close();
}
}
</body>
</code>
</node>
node.write-fasta.xml : it needs an input dataTable with two column (name/sequence), an integer to set the lentgh of the lines and requires a file-writer to write the fasta file.
<?xml version="1.0" encoding="UTF-8"?>
<node name="writefasta" label="Write Fasta" description="Write a Fasta file">
<inPort name="input">
</inPort>
<property type="file-save" name="fastaOut">
</property>
<property type="column" name="title" label="Title" description="Fasta title" data-type="string">
</property>
<property type="column" name="sequence" label="Sequence" description="Fasta Sequence" data-type="string">
</property>
<property type="int" name="fold" label="Fold size" description="Fold sequences greater than..." default="60">
</property>
<code>
<import>
import org.knime.core.data.container.CloseableRowIterator;
import java.io.*;
</import>
<body>
@Override
protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
{
CloseableRowIterator iter=null;
BufferedDataTable inTable=inData[0];
int fold = this.getPropertyFoldValue();
int tIndex = this.findTitleRequiredColumnIndex(inTable.getDataTableSpec());
int sIndex = this.findSequenceRequiredColumnIndex(inTable.getDataTableSpec());
PrintWriter w =null;
try
{
w= openFastaOutForPrinting();
int nRows=0;
double total=inTable.getRowCount();
iter=inTable.iterator();
while(iter.hasNext())
{
DataRow row=iter.next();
DataCell tCell =row.getCell(tIndex);
DataCell sCell =row.getCell(sIndex);
w.print(">");
if(!tCell.isMissing())
{
w.print(StringCell.class.cast(tCell).getStringValue());
}
if(!sCell.isMissing())
{
String sequence = StringCell.class.cast(sCell).getStringValue();
for(int i=0;i<sequence.length();++i)
{
if(i%fold == 0) w.println();
w.print(sequence.charAt(i));
exec.checkCanceled();
}
}
w.println();
exec.checkCanceled();
exec.setProgress(nRows/total,"Saving Fasta");
++nRows;
}
w.flush();
return new BufferedDataTable[0];
}
finally
{
if(w!=null) w.close();
}
}
</body>
</code>
</node>
The following Makefile generates the code, compiles and installs the new plugin in the ${knime.root}/plugins
directory :
.PHONY:all clean install run
knime.root=${HOME}/package/knime_2.11.2
all: install
run: install
${knime.root}/knime -clean
install:
rm -rf generated
xsltproc --xinclude \
--stringparam base.dir generated \
knime2java.xsl plugin.xml
$(MAKE) -C generated install knime.root=${knime.root}
clean:
rm -rf generated
The code generated by this Makefile:
$ find generated/ -type f generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeFactory.xml generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodePlugin.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeFactory.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeDialog.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/AbstractReadfastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeFactory.xml generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodePlugin.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeFactory.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeDialog.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/AbstractWritefastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeModel.java generated/src/com/github/lindenb/xsltsandbox/CompileAll__.java generated/src/com/github/lindenb/xsltsandbox/AbstractNodeModel.java generated/MANIFEST.MF generated/Makefile generated/plugin.xml generated/dist/com_github_lindenb_xsltsandbox.jar generated/dist/com.github.lindenb.xsltsandbox_2015.02.18.jar
The file generated/dist/com.github.lindenb.xsltsandbox_2015.02.18.jar is the file to move to ${knime.root}/plugins
(At the time of writing I put the jar at http://cardioserve.nantes.inserm.fr/~lindenb/knime/fasta/ )
open knime, the new nodes are now displayed in the Node Repository
You can now use the nodes, the code is displayed in the documentation:
That's it,
Pierre
Publié par Pierre Lindenbaum at 8:26 PM 0 commentaires
Libellés : bionformatics, code, fasta, generator, java, knime, xml, xslt