12 July 2013

Inside the Variation Toolkit: Gene Ontology for VCF, GUI for VCF

A quick note about three java-based tools for VCF files I wrote today.

VcfViewGui

VcfViewGui : a Simple java-Swing-based VCF viewer.


VCFGeneOntology

vcfgo reads a VCF annotated with VEP or SNPEFF, loads the data from GeneOntology and GOA and adds a new field in the INFO column for the GO terms for each position.
Example:
$ java -jar dist/vcfgo.jar I="https://raw.github.com/arq5x/gemini/master/test/tes.snpeff.vcf" |\
    grep -v -E '^##' | head -n 3

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1094PC0005  1094PC0009  1094PC0012  1094PC0013
chr1    30860   .   G   C   33.46   .   AC=2;AF=0.053;AN=38;BaseQRankSum=2.327;DP=49;Dels=0.00;EFF=DOWNSTREAM(MODIFIER||||85|FAM138A|protein_coding|CODING|ENST00000417324|),DOWNSTREAM(MODIFIER|||||FAM138A|processed_transcript|CODING|ENST00000461467|),DOWNSTREAM(MODIFIER|||||MIR1302-10|miRNA|NON_CODING|ENST00000408384|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000469289|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000473358|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000423562|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000430492|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000438504|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000538476|);FS=3.128;HRun=0;HaplotypeScore=0.6718;InbreedingCoeff=0.1005;MQ=36.55;MQ0=0;MQRankSum=0.217;QD=16.73;ReadPosRankSum=2.017 GT:AD:DP:GQ:PL  0/0:7,0:7:15.04:0,15,177    0/0:2,0:2:3.01:0,3,39   0/0:6,0:6:12.02:0,12,143    0/0:4,0:4:9.03:0,9,119
chr1    69270   .   A   G   2694.18 .   AC=40;AF=1.000;AN=40;DP=83;Dels=0.00;EFF=SYNONYMOUS_CODING(LOW|SILENT|tcA/tcG|S60|305|OR4F5|protein_coding|CODING|ENST00000335137|exon_1_69091_70008);FS=0.000;GOA=OR4F5|GO:0004984&GO:0005886&GO:0004930&GO:0016021;HRun=0;HaplotypeScore=0.0000;InbreedingCoeff=-0.0598;MQ=31.06;MQ0=0;QD=32.86   GT:AD:DP:GQ:PL  ./. ./. 1/1:0,3:3:9.03:106,9,0  1/1:0,6:6:18.05:203,18,0

VCFFilterGeneOntology

vcffiltergo reads a VCF annotated with VEP or SNPEFF, loads the data from GeneOntology and GOA and adds a filter in the FILTER column if a gene at the current genomic location is a descendant of a given GO term.
Example:
$  java -jar dist/vcffiltergo.jar I="https://raw.github.com/arq5x/gemini/master/test/test1.snpeff.vcf"  \
    CHILD_OF=GO:0005886 FILTER=MEMBRANE  |\
    grep -v "^##"   | head -n 3

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1094PC0005  1094PC0009  1094PC0012  1094PC0013
chr1    30860   .   G   C   33.46   PASS    AC=2;AF=0.053;AN=38;BaseQRankSum=2.327;DP=49;Dels=0.00;EFF=DOWNSTREAM(MODIFIER||||85|FAM138A|protein_coding|CODING|ENST00000417324|),DOWNSTREAM(MODIFIER|||||FAM138A|processed_transcript|CODING|ENST00000461467|),DOWNSTREAM(MODIFIER|||||MIR1302-10|miRNA|NON_CODING|ENST00000408384|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000469289|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000473358|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000423562|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000430492|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000438504|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000538476|);FS=3.128;HRun=0;HaplotypeScore=0.6718;InbreedingCoeff=0.1005;MQ=36.55;MQ0=0;MQRankSum=0.217;QD=16.73;ReadPosRankSum=2.017 GT:AD:DP:GQ:PL  0/0:7,0:7:15.04:0,15,177    0/0:2,0:2:3.01:0,3,39   0/0:6,0:6:12.02:0,12,143    0/0:4,0:4:9.03:0,9,119
chr1    69270   .   A   G   2694.18 MEMBRANE    AC=40;AF=1.000;AN=40;DP=83;Dels=0.00;EFF=SYNONYMOUS_CODING(LOW|SILENT|tcA/tcG|S60|305|OR4F5|protein_coding|CODING|ENST00000335137|exon_1_69091_70008);FS=0.000;HRun=0;HaplotypeScore=0.0000;InbreedingCoeff=-0.0598;MQ=31.06;MQ0=0;QD=32.86 GT:AD:DP:GQ:PL  ./. ./. 1/1:0,3:3:9.03:106,9,0  1/1:0,6:6:18.05:203,18,0



That's it,

Pierre

2 comments:

RxAwesome33 said...

Hi Peirre,
First off thank you for your tremendous effort in the bioinformatics community. I am seeking help implementing your VcfGO script. Would you please take a look at the error message, and point me ther right way? Thank you again -Keller.
hart@hart-ubuntu:~/jvarkit$ java -jar dist/vcfgo.jar I=/home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf GO_INPUT=http://geneontology.org/gene-associations/gene_association.fb.gz GOA_INPUT=ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/FLY/gene_association.goa_fly.gz OUT=/home/hart/BigData/VCF/EffGO.vcf
[Wed Feb 11 12:34:55 CST 2015] com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology GOA=ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/FLY/gene_association.goa_fly.gz GO=http://geneontology.org/gene-associations/gene_association.fb.gz IN=/home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf OUT=/home/hart/BigData/VCF/EffGO.vcf VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Feb 11 12:34:55 CST 2015] Executing as hart@hart-ubuntu on Linux 3.13.0-45-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_75-b13; Picard version: null JdkDeflater
INFO 2015-02-11 12:34:55 AbstractVCFFilter reading from /home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf
INFO 2015-02-11 12:34:55 AbstractVCFFilter writing to /home/hart/BigData/VCF/EffGO.vcf
INFO 2015-02-11 12:34:55 AbstractVcfGeneOntology read GO http://geneontology.org/gene-associations/gene_association.fb.gz
java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.github.lindenb.jvarkit.tools.vcfgo.AbstractVcfGeneOntology.readGO(AbstractVcfGeneOntology.java:60)
at com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology.doWork(VcfGeneOntology.java:35)
at com.github.lindenb.jvarkit.util.vcf.AbstractVCFFilter.doWork(AbstractVCFFilter.java:73)
at com.github.lindenb.jvarkit.util.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:179)
at com.github.lindenb.jvarkit.util.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:120)
at com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology.main(VcfGeneOntology.java:89)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83)
at com.github.lindenb.jvarkit.util.go.GoTree.parse(GoTree.java:286)
at com.github.lindenb.jvarkit.util.go.GoTree.parse(GoTree.java:311)
at com.github.lindenb.jvarkit.tools.vcfgo.AbstractVcfGeneOntology.readGO(AbstractVcfGeneOntology.java:55)
... 5 more
ERROR 2015-02-11 12:34:56 AbstractVCFFilter

Pierre Lindenbaum said...

Hi, can you please post this problem at https://github.com/lindenb/jvarkit/issues ?