24 May 2013

A Tribble/FeatureCodec handling JSON-based annotations files.

I wrote a java FeatureCodec for JSON with a the tribble library.
Citing the GATK tream: "The Tribble project was started as an effort to overhaul our reference-ordered data system; we had many different formats that were shoehorned into a common framework that didn't really work as intended. What we wanted was a common framework that allowed for searching of reference ordered data, regardless of the underlying type. Jim Robinson had developed indexing schemes for text-based files, which was incorporated into the Tribble library."".

The library is available at:https://github.com/lindenb/jsontribble.


The library contains the tools to sort, index and query the json file.

As a proof of concept, I also created a REST-based service to query those files.

REST/JSON

For example http://localhost:8080/jsontribble/rest/tribble/resources/dbsnp/annotations.json?chrom=chr1&start=881826&end=981826 returns:
{"header":{"description":"UCSC  snp137: select count(*) from snp137 where FIND_IN_SET(func,\"missense\")>0 and avHet>0.1"}
,"features":[
{"chrom":"chr1","start":881826,"end":881827,"name":"rs112341375","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"C/G","class":"single","valid":["by-frequency"],"avHet":0.5,"func":["missense"],"submitters":["BUSHMAN"]}
{"chrom":"chr1","start":897119,"end":897120,"name":"rs28530579","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"C/G","class":"single","valid":["unknown"],"avHet":0.375,"func":["missense"],"submitters":["ABI","ENSEMBL","SSAHASNP"]}
{"chrom":"chr1","start":907739,"end":907740,"name":"rs112235940","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"A/G","class":"single","valid":["unknown"],"avHet":0.5,"func":["missense"],"submitters":["COMPLETE_GENOMICS"]}
{"chrom":"chr1","start":949607,"end":949608,"name":"rs1921","score":0,"strand":"+","refNCBI":"G","refUCSC":"G","observed":"A/C/G","class":"single","valid":["by-cluster","by-frequency","by-1000genomes"],"avHet":0.464348,"func":["missense"],"submitters":["1000GENOMES","AFFY","BGI","BUSHMAN","CGAP-GAI","CLINSEQ_SNP","COMPLETE_GENOMICS","CORNELL","DEBNICK","EXOME_CHIP","GMI","HGSV","ILLUMINA","ILLUMINA-UK","KRIBB_YJKIM","LEE","MGC_GENOME_DIFF","NHLBI-ESP","SC_JCM","SC_SNP","SEATTLESEQ","SEQUENOM","UWGC","WIAF","YUSUKE"],"bitfields":["maf-5-some-pop","maf-5-all-pops"]}
]}

REST/XML

Example http://localhost:8080/jsontribble/rest/tribble/resources/dbsnp/annotations.xml?chrom=chr1&start=897119&end=981826
<?xml version="1.0" encoding="UTF-8"?>
<annotations chrom="chr1" start="897119" end="981826">
  <header>
    <description>UCSC  snp137: select count(*) from snp137 where FIND_IN_SET(func,"missense")&gt;0 and avHet&gt;0.1</description>
  </header>
  <features>
    <feature>
      <chrom>chr1</chrom>
      <start type="integer">897119</start>
      <end type="integer">897120</end>
      <name>rs28530579</name>
      <score type="integer">0</score>
      <strand>+</strand>
      <refNCBI>G</refNCBI>
      <refUCSC>G</refUCSC>
      <observed>C/G</observed>
      <class>single</class>
      <valid>

BED/text

Example: http://localhost:8080/jsontribble/rest/tribble/resources/merge/annotations.bed?chrom=chr1&start=897119&end=981826.
chr1    895966  901099  {"chrom":"chr1","start":895966,"end":901099,"strand":"+","name":"uc001aca.2","cds...
chr1    896828  897858  {"chrom":"chr1","start":896828,"end":897858,"strand":"+","name":"uc001acb.1","cds...
chr1    897008  897858  {"chrom":"chr1","start":897008,"end":897858,"strand":"+","name":"uc010nya.1","cds...
chr1    897119  897120  {"chrom":"chr1","start":897119,"end":897120,"name":"rs28530579","score":0,"strand...
chr1    897734  899229  {"chrom":"chr1","start":897734,"end":899229,"strand":"+","name":"uc010nyb.1","cds...
chr1    901876  910484  {"chrom":"chr1","start":901876,"end":910484,"strand":"+","name":"uc001acd.3","cds...
chr1    901876  910484  {"chrom":"chr1","start":901876,"end":910484,"strand":"+","name":"uc001ace.3","cds...
chr1    901876  910484  {"chrom":"chr1","start":901876,"end":910484,"strand":"+","name":"uc001acf.3","cds...
chr1    907739  907740  {"chrom":"chr1","start":907739,"end":907740,"name":"rs112235940","score":0,"stran...
chr1    910578  917473  {"chrom":"chr1","start":910578,"end":917473,"strand":"-","name":"uc001ach.2","cds...
chr1    934341  935552  {"chrom":"chr1","start":934341,"end":935552,"strand":"-","name":"uc001aci.2","cds...
chr1    934341  935552  {"chrom":"chr1","start":934341,"end":935552,"strand":"-","name":"uc010nyc.1","cds...
chr1    948846  949919  {"chrom":"chr1","start":948846,"end":949919,"strand":"+","name":"uc001acj.4","cds...
chr1    949607  949608  {"chrom":"chr1","start":949607,"end":949608,"name":"rs1921","score":0,"strand":"+...
chr1    955502  991499  {"chrom":"chr1","start":955502,"end":991499,"strand":"+","name":"uc001ack.2","cds...

No comments: