A Tribble/FeatureCodec handling JSON-based annotations files.
I wrote a java FeatureCodec for JSON with a the tribble library.
Citing the GATK tream: "The Tribble project was started as an effort to overhaul our reference-ordered data system; we had many different formats that were shoehorned into a common framework that didn't really work as intended. What we wanted was a common framework that allowed for searching of reference ordered data, regardless of the underlying type. Jim Robinson had developed indexing schemes for text-based files, which was incorporated into the Tribble library."".
The library is available at:https://github.com/lindenb/jsontribble.
The library contains the tools to sort, index and query the json file.
As a proof of concept, I also created a REST-based service to query those files.
REST/JSON
For example http://localhost:8080/jsontribble/rest/tribble/resources/dbsnp/annotations.json?chrom=chr1&start=881826&end=981826 returns:{"header":{"description":"UCSC snp137: select count(*) from snp137 where FIND_IN_SET(func,\"missense\")>0 and avHet>0.1"} ,"features":[ {"chrom":"chr1","start":881826,"end":881827,"name":"rs112341375","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"C/G","class":"single","valid":["by-frequency"],"avHet":0.5,"func":["missense"],"submitters":["BUSHMAN"]} {"chrom":"chr1","start":897119,"end":897120,"name":"rs28530579","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"C/G","class":"single","valid":["unknown"],"avHet":0.375,"func":["missense"],"submitters":["ABI","ENSEMBL","SSAHASNP"]} {"chrom":"chr1","start":907739,"end":907740,"name":"rs112235940","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"A/G","class":"single","valid":["unknown"],"avHet":0.5,"func":["missense"],"submitters":["COMPLETE_GENOMICS"]} {"chrom":"chr1","start":949607,"end":949608,"name":"rs1921","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"A/C/G","class":"single","valid":["by-cluster","by-frequency","by-1000genomes"],"avHet":0.464348,"func":["missense"],"submitters":["1000GENOMES","AFFY","BGI","BUSHMAN","CGAP-GAI","CLINSEQ_SNP","COMPLETE_GENOMICS","CORNELL","DEBNICK","EXOME_CHIP","GMI","HGSV","ILLUMINA","ILLUMINA-UK","KRIBB_YJKIM","LEE","MGC_GENOME_DIFF","NHLBI-ESP","SC_JCM","SC_SNP","SEATTLESEQ","SEQUENOM","UWGC","WIAF","YUSUKE"],"bitfields":["maf-5-some-pop","maf-5-all-pops"]} ]}
REST/XML
Example http://localhost:8080/jsontribble/rest/tribble/resources/dbsnp/annotations.xml?chrom=chr1&start=897119&end=981826<?xml version="1.0" encoding="UTF-8"?> <annotations chrom="chr1" start="897119" end="981826"> <header> <description>UCSC snp137: select count(*) from snp137 where FIND_IN_SET(func,"missense")>0 and avHet>0.1</description> </header> <features> <feature> <chrom>chr1</chrom> <start type="integer">897119</start> <end type="integer">897120</end> <name>rs28530579</name> <score type="integer">0</score> <strand>+</strand> <refNCBI>G</refNCBI> <refUCSC>G</refUCSC> <observed>C/G</observed> <class>single</class> <valid>
BED/text
Example: http://localhost:8080/jsontribble/rest/tribble/resources/merge/annotations.bed?chrom=chr1&start=897119&end=981826.chr1 895966 901099 {"chrom":"chr1","start":895966,"end":901099,"strand":"+","name":"uc001aca.2","cds... chr1 896828 897858 {"chrom":"chr1","start":896828,"end":897858,"strand":"+","name":"uc001acb.1","cds... chr1 897008 897858 {"chrom":"chr1","start":897008,"end":897858,"strand":"+","name":"uc010nya.1","cds... chr1 897119 897120 {"chrom":"chr1","start":897119,"end":897120,"name":"rs28530579","score":0,"strand... chr1 897734 899229 {"chrom":"chr1","start":897734,"end":899229,"strand":"+","name":"uc010nyb.1","cds... chr1 901876 910484 {"chrom":"chr1","start":901876,"end":910484,"strand":"+","name":"uc001acd.3","cds... chr1 901876 910484 {"chrom":"chr1","start":901876,"end":910484,"strand":"+","name":"uc001ace.3","cds... chr1 901876 910484 {"chrom":"chr1","start":901876,"end":910484,"strand":"+","name":"uc001acf.3","cds... chr1 907739 907740 {"chrom":"chr1","start":907739,"end":907740,"name":"rs112235940","score":0,"stran... chr1 910578 917473 {"chrom":"chr1","start":910578,"end":917473,"strand":"-","name":"uc001ach.2","cds... chr1 934341 935552 {"chrom":"chr1","start":934341,"end":935552,"strand":"-","name":"uc001aci.2","cds... chr1 934341 935552 {"chrom":"chr1","start":934341,"end":935552,"strand":"-","name":"uc010nyc.1","cds... chr1 948846 949919 {"chrom":"chr1","start":948846,"end":949919,"strand":"+","name":"uc001acj.4","cds... chr1 949607 949608 {"chrom":"chr1","start":949607,"end":949608,"name":"rs1921","score":0,"strand":"+... chr1 955502 991499 {"chrom":"chr1","start":955502,"end":991499,"strand":"+","name":"uc001ack.2","cds...
No comments:
Post a Comment