My first walker for the GATK : my notebook
This is my first notebook for developping a new Walker for the Genome Analysis Toolkit. This post was mostly inspired by the following pdf: kvg_20_line_lifesavers_mad_v2.pptx.pdf.
Get the sources
git clone http://github.com/broadgsa/gatk.git GATK.devthe javac compiler also requires the following library from google :http://code.google.com/p/cofoja/.
A first "Short-Reads" walker
The following class ReadWalker scans the reads and print them as fasta. The @Output annotation tells the GATK that we're going to channel our output through the java.io.PrintStream object. This field is automatically filled by the application runtime.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package mygatk; | |
import java.io.PrintStream; | |
import org.broadinstitute.sting.commandline.Output; | |
import org.broadinstitute.sting.gatk.contexts.ReferenceContext; | |
import org.broadinstitute.sting.gatk.refdata.ReadMetaDataTracker; | |
import org.broadinstitute.sting.gatk.walkers.ReadWalker; | |
import org.broadinstitute.sting.utils.sam.GATKSAMRecord; | |
public class HelloRead extends ReadWalker<Integer, Integer> | |
{ | |
/** | |
tells the GATK that weʼre going to | |
channel our output through this | |
object. */ | |
@Output(shortName="o",fullName="output",doc="save as") | |
PrintStream out; | |
@Override | |
public Integer map(ReferenceContext ref, GATKSAMRecord read, | |
ReadMetaDataTracker metaDataTracker) { | |
out.println(">"+read.getReadName()); | |
out.println(new String(read.getReadBases())); | |
return null; | |
} | |
@Override | |
public Integer reduceInit() | |
{ | |
return null; | |
} | |
@Override | |
public Integer reduce(Integer value, Integer sum) | |
{ | |
return null; | |
} | |
} |
Compilation
javac -cp /path/to/GenomeAnalysisTK.jar:/path/to/cofoja-1.0-r139.jar:. \ -sourcepath src \ -d tmp src/mygatk/HelloRead.java jar cvf HelloRead.jar -C tmp .
Running
Here I'm using a BAM from the 'examples' folder of samtools. (We need to pre-process this BAM with picard AddOrReplaceReadGroups). We then use our library as follow:java -cp path/to/GenomeAnalysisTK.jar:HelloRead.jar \ org.broadinstitute.sting.gatk.CommandLineGATK -T HelloRead \ -I test.bam \ -R ${SAMTOOLS}/examples/ex1.fa
Result:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mkdir -p tmp | |
javac -cp /home/lindenb/package/GenomeAnalysisTK-1.4-37-g0b29d54/GenomeAnalysisTK.jar:/home/lindenb/package/GATK.dev/lib/cofoja-1.0-r139.jar:. -sourcepath src -d tmp src/mygatk/HelloRead.java | |
warning: Supported source version 'RELEASE_6' from annotation processor 'com.google.java.contract.core.apt.AnnotationProcessor' less than -source '1.7' | |
1 warning | |
jar cvf HelloRead.jar -C tmp . | |
added manifest | |
adding: mygatk/(in = 0) (out= 0)(stored 0%) | |
adding: mygatk/HelloRead.class(in = 1861) (out= 803)(deflated 56%) | |
java -jar ~/package/picard/picard-tools-1.63/AddOrReplaceReadGroups.jar \ | |
I=/home/lindenb/package/samtools-0.1.18/examples/ex1.bam \ | |
O=test.bam \ | |
RGLB=test.lib \ | |
RGPL=illumina \ | |
RGPU=testunit \ | |
SM=test.sample \ | |
VALIDATION_STRINGENCY=LENIENT | |
[Thu Mar 08 16:17:55 CET 2012] net.sf.picard.sam.AddOrReplaceReadGroups INPUT=/home/lindenb/package/samtools-0.1.18/examples/ex1.bam OUTPUT=test.bam RGLB=test.lib RGPL=illumina RGPU=testunit RGSM=test.sample VALIDATION_STRINGENCY=LENIENT RGID=1 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false | |
[Thu Mar 08 16:17:55 CET 2012] Executing as lindenb@hardyweinberg on Linux 3.0.0-16-generic i386; Java HotSpot(TM) Server VM 1.7.0_01-b08; Picard version: 1.63(1131) | |
INFO 2012-03-08 16:17:55 AddOrReplaceReadGroups Created read group ID=1 PL=illumina LB=test.lib SM=test.sample | |
[Thu Mar 08 16:17:55 CET 2012] net.sf.picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.01 minutes. | |
Runtime.totalMemory()=54853632 | |
/home/lindenb/package/samtools-0.1.18/samtools index test.bam | |
java -cp /home/lindenb/package/GenomeAnalysisTK-1.4-37-g0b29d54/GenomeAnalysisTK.jar:HelloRead.jar org.broadinstitute.sting.gatk.CommandLineGATK -T HelloRead \ | |
-I test.bam \ | |
-R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa | |
INFO 16:17:57,700 HelpFormatter - --------------------------------------------------------------------------------- | |
INFO 16:17:57,702 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-37-g0b29d54, Compiled 2012/02/27 19:56:39 | |
INFO 16:17:57,702 HelpFormatter - Copyright (c) 2010 The Broad Institute | |
INFO 16:17:57,702 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki | |
INFO 16:17:57,703 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa | |
INFO 16:17:57,703 HelpFormatter - Program Args: -T HelloRead -I test.bam -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa | |
INFO 16:17:57,703 HelpFormatter - Date/Time: 2012/03/08 16:17:57 | |
INFO 16:17:57,703 HelpFormatter - --------------------------------------------------------------------------------- | |
INFO 16:17:57,704 HelpFormatter - --------------------------------------------------------------------------------- | |
INFO 16:17:57,707 GenomeAnalysisEngine - Strictness is SILENT | |
INFO 16:17:57,743 SAMDataSource$SAMReaders - Initializing SAMRecords in serial | |
INFO 16:17:57,758 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 | |
>B7_591:4:96:693:509 | |
CACTAGTGGCTCATTGTAAATGTGTGGTTTAACTCG | |
INFO 16:17:57,972 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] | |
INFO 16:17:57,973 TraversalEngine - Location processed.reads runtime per.1M.reads completed total.runtime remaining | |
>EAS54_65:7:152:368:113 | |
CTAGTGGCTCATTGTAAATGTGTGGTTTAACTCGT | |
>EAS51_64:8:5:734:57 | |
AGTGGCTCATTGTAAATGTGTGGTTTAACTCGTCC | |
>B7_591:1:289:587:906 | |
GTGGCTCATTGTAATTTTTTGTTTTAACTCTTCTCT | |
>EAS56_59:8:38:671:758 | |
GCTCATTGTAAATGTGTGGTTTAACTCGTCCATGG | |
>EAS56_61:6:18:467:281 | |
ATTGTAAATGTGTGGTTTAACTCGTCCCTGGCCCA | |
>EAS114_28:5:296:340:699 | |
ATTGTAAATGTGTGGTTTAACTCGTCCATGGCCCAG | |
>B7_597:6:194:894:408 | |
TGTAAATGTGTGGTTTAACTCGTCCATTGCCCAGC | |
>EAS188_4:8:12:628:973 | |
AAATGTGTGGTTTAACTCGTCCATGGCCCAGCATT | |
(...) | |
>EAS139_11:7:50:1229:1313 | |
TTTTTTCTTTTTTTTTTTTTTTTTTTTGCATGCCA | |
>EAS54_65:3:320:20:250 | |
TTTTTTTTTTTTTTTTTTTTTTTGCATGCCAGAAA | |
>EAS114_26:7:37:79:581 | |
TTTTTTTTTTTTTTTTTTTTTTTCATGCCAGAAAA | |
INFO 16:18:40,276 Walker - [REDUCE RESULT] Traversal result is: null | |
INFO 16:18:40,277 TraversalEngine - Total runtime 0.45 secs, 0.01 min, 0.00 hours | |
INFO 16:18:40,298 TraversalEngine - 0 reads were filtered out during traversal out of 3310 total (0.00%) |
The Makefile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GATKDIR=${HOME}/package/GenomeAnalysisTK-1.4-37-g0b29d54 | |
COFOJA=${HOME}/package/GATK.dev/lib/cofoja-1.0-r139.jar | |
SAMTOOLS=${HOME}/package/samtools-0.1.18 | |
test:compile | |
java -jar ~/package/picard/picard-tools-1.63/AddOrReplaceReadGroups.jar \ | |
I=${SAMTOOLS}/examples/ex1.bam \ | |
O=test.bam \ | |
RGLB=test.lib \ | |
RGPL=illumina \ | |
RGPU=testunit \ | |
SM=test.sample \ | |
VALIDATION_STRINGENCY=LENIENT | |
${SAMTOOLS}/samtools index test.bam | |
java -cp ${GATKDIR}/GenomeAnalysisTK.jar:HelloRead.jar org.broadinstitute.sting.gatk.CommandLineGATK -T HelloRead \ | |
-I test.bam \ | |
-R ${SAMTOOLS}/examples/ex1.fa | |
compile: | |
mkdir -p tmp | |
javac -cp ${GATKDIR}/GenomeAnalysisTK.jar:${COFOJA}:. -sourcepath src -d tmp src/mygatk/HelloRead.java | |
jar cvf HelloRead.jar -C tmp . |