28 May 2014

Uniprot → SVG

My colleague @Solena recently asked me to write a tool that would help her to prepare a large number of figures for an article. The tool I wrote fetches an entry for a given Uniprot accession and creates a SVG diagram , editable in Inkscape. It is available at


That's it,

Pierre

22 May 2014

Breaking the " same origin security policy" with CORS. An example with @GenomeBrowser / DAS.

Jerven Bolleman recently taught me about the CORS/Cross-origin resource sharing:



"Cross-origin resource sharing (CORS) is a mechanism that allows many resources (e.g. fonts, JavaScript, etc.) on a web page to be requested from another domain outside of the domain the resource originated from. In particular, JavaScript's AJAX calls can use the XMLHttpRequest mechanism. Such "cross-domain" requests would otherwise be forbidden by web browsers, per the same origin security policy."

I've created a page testing if some bioinformatics web-service support CORS. This page is available at : http://lindenb.github.io/pages/cors/index.html

Interestingly NCBI, Uniprot and UCSC support CORS. As an example, the following <form> fetches a DNA sequence using the DAS server of the UCSC and display it:



The script:

var UcscCORS={
createXMLHttpRequest:function(method, url)
{
var xhr = new XMLHttpRequest();
if ("withCredentials" in xhr)
{
xhr.open(method, url, true);
}
else if (typeof XDomainRequest != "undefined")
{
xhr = new XDomainRequest();
xhr.open(method, url);
}
else
{
xhr = null;
}
return xhr;
},
removeAllChildren:function(root)
{
while(root.firstChild) root.removeChild(root.firstChild);
return root;
},
getElementByName:function(root,name)
{
if(root==null) return null;
for(var n=root.firstChild;n!=null;n=n.nextSibling)
{
if(n.nodeType!=1) continue;
if(n.nodeName!=name) continue;
return n;
}
return null;
},
fetchUCSC:function()
{
var _this=this;
var ucsc_position=document.getElementById("ucsc_position");
this.removeAllChildren(ucsc_position);
var xhr=this.createXMLHttpRequest("GET",
"http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment="+
encodeURIComponent(ucsc_position.value.trim())
);
xhr.onload=function()
{
var xmlDoc = xhr.responseXML;
var E= _this.getElementByName(xmlDoc.documentElement,"SEQUENCE");
E= _this.getElementByName(E,"DNA");
var sequence=E.textContent.replace(/[ \t\n\r]/g,"");
var pre=document.getElementById("dna876271863");
_this.removeAllChildren(pre);
for(var i=0;i < sequence.length;++i)
{
var base=sequence.substring(i,i+1);
if(i>0 && i%75==0) pre.appendChild(document.createElement("br"));
var span=document.createElement("span");
var css="gray";
switch(base.toUpperCase())
{
case "A": css="red";break;
case "T": css="green";break;
case "G": css="yellow";break;
case "C": css="blue";break;
}
span.setAttribute("style","color:"+css);
span.appendChild(document.createTextNode(base));
pre.appendChild(span);
}
};
xhr.onerror = function()
{
var pre=document.getElementById("dna876271863");
_this.removeAllChildren(pre);
pre.appendChild(document.createTextNode("ERROR"));
};
xhr.send();
}
};
view raw ucsc_cors.js hosted with ❤ by GitHub


That's it
Pierre

20 May 2014

A nodejs-based REST server for the UCSC @GenomeBrowser



Node.js provides a simple mechanism to write a REST server. As an exercise, I wrote a REST server for the mysql server of the UCSC genome bowser. The code is available on github at:


Starting the server


$ cd bionode
$ node ucsc/ucsc.js
Server running at http://localhost:8080/

METHOD: /schema/databases



Lists the available databases :e.g: http://localhost:8080/schemas/databases


[
"information_schema",
"ailMel1",
"allMis1",
"anoCar1",

(...)
"visiGene",
"xenTro1",
"xenTro2",
"xenTro3"
]


This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/databases?callback=handle


handle([
"information_schema",
"ailMel1",
"allMis1",
"anoCar1",
(...)
"visiGene",
"xenTro1",
"xenTro2",
"xenTro3"
]);


METHOD: /schema/:database/tables


Lists the available tables for a given database :e.g: http://localhost:8080/schemas/anoCar1/tables


[
"all_mrna",
"author",
"blastHg18KG",
"cds",
(...)
"xenoRefFlat",
"xenoRefGene",
"xenoRefSeqAli"
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/anoCar1/tables?callback=handle


handle([
"all_mrna",
"author",
"blastHg18KG",
"cds",
"cell",
(...)
"xenoRefFlat",
"xenoRefGene",
"xenoRefSeqAli"
]);

METHOD: /schema/:database/:table


Returns a schema for the given database.table. E.g: http://localhost:8080/schemas/anoCar1/xenoMrna



{"database":"anoCar1","table":"xenoMrna","fields":[{"name":"bin","type":"smallint(5) unsigned","key":""},{"name":"matches","type":"int(10) unsigned","key":""},{"name":"misMatches","type":"int(10) unsigned","key":""},{"name":"repMatches","type":"int(10) unsigned","key":""},{"name":"nCount","type":"int(10) unsigned","key":""},{"name":"qNumInsert","type":"int(10) unsigned","key":""},{"name":"qBaseInsert","type":"int(10) unsigned","key":""},{"name":"tNumInsert","type":"int(10) unsigned","key":""},{"name":"tBaseInsert","type":"int(10) unsigned","key":""},{"name":"strand","type":"char(2)","key":""},{"name":"qName","type":"varchar(255)","key":"MUL"},{"name":"qSize","type":"int(10) unsigned","key":""},{"name":"qStart","type":"int(10) unsigned","key":""},{"name":"qEnd","type":"int(10) unsigned","key":""},{"name":"tName","type":"varchar(255)","key":"MUL"},{"name":"tSize","type":"int(10) unsigned","key":""},{"name":"tStart","type":"int(10) unsigned","key":""},{"name":"tEnd","type":"int(10) unsigned","key":""},{"name":"blockCount","type":"int(10) unsigned","key":""},{"name":"blockSizes","type":"longblob","key":""},{"name":"qStarts","type":"longblob","key":""},{"name":"tStarts","type":"longblob","key":""}]}


This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/anoCar1/xenoMrna?callback=handler


handler({"database":"anoCar1","table":"xenoMrna","fields":[{"name":"bin","type":"smallint(5) unsigned","key":""},{"name":"matches","type":"int(10) unsigned","key":""},{"name":"misMatches","type":"int(10) unsigned","key":""},{"name":"repMatches","type":"int(10) unsigned","key":""},{"name":"nCount","type":"int(10) unsigned","key":""},{"name":"qNumInsert","type":"int(10) unsigned","key":""},{"name":"qBaseInsert","type":"int(10) unsigned","key":""},{"name":"tNumInsert","type":"int(10) unsigned","key":""},{"name":"tBaseInsert","type":"int(10) unsigned","key":""},{"name":"strand","type":"char(2)","key":""},{"name":"qName","type":"varchar(255)","key":"MUL"},{"name":"qSize","type":"int(10) unsigned","key":""},{"name":"qStart","type":"int(10) unsigned","key":""},{"name":"qEnd","type":"int(10) unsigned","key":""},{"name":"tName","type":"varchar(255)","key":"MUL"},{"name":"tSize","type":"int(10) unsigned","key":""},{"name":"tStart","type":"int(10) unsigned","key":""},{"name":"tEnd","type":"int(10) unsigned","key":""},{"name":"blockCount","type":"int(10) unsigned","key":""},{"name":"blockSizes","type":"longblob","key":""},{"name":"qStarts","type":"longblob","key":""},{"name":"tStarts","type":"longblob","key":""}]});


METHOD: /ucsc/:database/:table/:column/:key


Fetch the rows for a given database.name having a :column==:key . The :column must be indexed. E.g: http://localhost:8080/ucsc/anoCar1/ensGene/name/ENSACAT00000004346


[
{"bin":592,"name":"ENSACAT00000004346","chrom":"scaffold_111","strand":"-","txStart":991522,"txEnd":996396,"cdsStart":991522,"cdsEnd":996396,"exonCount":3,"exonStarts":"991522,995669,995976,","exonEnds":"991954,995972,996396,","score":0,"name2":"PELO","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,0,"}
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/ucsc/anoCar1/ensGene/name/ENSACAT00000004346?callback=handler


handler([
{"bin":592,"name":"ENSACAT00000004346","chrom":"scaffold_111","strand":"-","txStart":991522,"txEnd":996396,"cdsStart":991522,"cdsEnd":996396,"exonCount":3,"exonStarts":"991522,995669,995976,","exonEnds":"991954,995972,996396,","score":0,"name2":"PELO","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,0,"}
]);

METHOD: /ucsc/:database/:table?chrom=?&start=?&end=?




Fetch the rows for a given genomic database.name overlapping the given range. This method uses the UCSC-bin index if it is available; E.g: http://localhost:8080/ucsc/anoCar1/ensGene?chrom=scaffold_111&start=600000&end=900000


[
{"bin":589,"name":"ENSACAT00000003906","chrom":"scaffold_111","strand":"-","txStart":594783,"txEnd":614216,"cdsStart":595000,"cdsEnd":614201,"exonCount":9,"exonStarts":"594783,601291,601744,603640,604745,604865,609139,611740,614097,","exonEnds":"595105,601406,601813,603736,604771,604942,609173,611840,614216,","score":0,"name2":"DPM1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,2,2,2,0,1,0,2,0,"},
{"bin":589,"name":"ENSACAT00000003908","chrom":"scaffold_111","strand":"+","txStart":614382,"txEnd":615600,"cdsStart":614382,"cdsEnd":615600,"exonCount":1,"exonStarts":"614382,","exonEnds":"615600,","score":0,"name2":"MOCS3","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,"},
{"bin":589,"name":"ENSACAT00000003918","chrom":"scaffold_111","strand":"-","txStart":638920,"txEnd":642127,"cdsStart":638920,"cdsEnd":642127,"exonCount":2,"exonStarts":"638920,641368,","exonEnds":"639691,642127,","score":0,"name2":"KCNG1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,"},
{"bin":591,"name":"ENSACAT00000003920","chrom":"scaffold_111","strand":"+","txStart":814576,"txEnd":826972,"cdsStart":814576,"cdsEnd":826972,"exonCount":3,"exonStarts":"814576,825125,826845,","exonEnds":"814594,825247,826972,","score":0,"name2":"ENSACAG00000003945","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,0,2,"},
{"bin":591,"name":"ENSACAT00000004042","chrom":"scaffold_111","strand":"-","txStart":849731,"txEnd":881887,"cdsStart":849731,"cdsEnd":881887,"exonCount":24,"exonStarts":"849731,851343,855421,856165,857842,858090,861054,861943,862949,863773,865029,865639,867414,868216,872220,873601,874396,876850,877105,877711,878919,879681,881320,881738,","exonEnds":"849809,851460,855511,856279,857947,858201,861157,862027,863026,863866,865171,865722,867525,868368,872360,873738,874600,876994,877263,877850,878993,879847,881471,881887,","score":0,"name2":"ITGA2","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"0,0,0,0,0,0,2,2,0,0,2,0,0,1,2,0,0,0,1,0,1,0,2,0,"},
{"bin":591,"name":"ENSACAT00000004050","chrom":"scaffold_111","strand":"-","txStart":883724,"txEnd":897808,"cdsStart":883724,"cdsEnd":897808,"exonCount":5,"exonStarts":"883724,885433,889264,889742,897701,","exonEnds":"883858,885548,889356,889852,897808,","score":0,"name2":"ENSACAG00000004086","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"1,0,1,2,0,"}
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/ucsc/anoCar1/ensGene?chrom=scaffold_111&start=600000&end=900000&callback=handler


handler([
{"bin":589,"name":"ENSACAT00000003906","chrom":"scaffold_111","strand":"-","txStart":594783,"txEnd":614216,"cdsStart":595000,"cdsEnd":614201,"exonCount":9,"exonStarts":"594783,601291,601744,603640,604745,604865,609139,611740,614097,","exonEnds":"595105,601406,601813,603736,604771,604942,609173,611840,614216,","score":0,"name2":"DPM1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,2,2,2,0,1,0,2,0,"},
{"bin":589,"name":"ENSACAT00000003908","chrom":"scaffold_111","strand":"+","txStart":614382,"txEnd":615600,"cdsStart":614382,"cdsEnd":615600,"exonCount":1,"exonStarts":"614382,","exonEnds":"615600,","score":0,"name2":"MOCS3","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,"},
{"bin":589,"name":"ENSACAT00000003918","chrom":"scaffold_111","strand":"-","txStart":638920,"txEnd":642127,"cdsStart":638920,"cdsEnd":642127,"exonCount":2,"exonStarts":"638920,641368,","exonEnds":"639691,642127,","score":0,"name2":"KCNG1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,"},
{"bin":591,"name":"ENSACAT00000003920","chrom":"scaffold_111","strand":"+","txStart":814576,"txEnd":826972,"cdsStart":814576,"cdsEnd":826972,"exonCount":3,"exonStarts":"814576,825125,826845,","exonEnds":"814594,825247,826972,","score":0,"name2":"ENSACAG00000003945","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,0,2,"},
{"bin":591,"name":"ENSACAT00000004042","chrom":"scaffold_111","strand":"-","txStart":849731,"txEnd":881887,"cdsStart":849731,"cdsEnd":881887,"exonCount":24,"exonStarts":"849731,851343,855421,856165,857842,858090,861054,861943,862949,863773,865029,865639,867414,868216,872220,873601,874396,876850,877105,877711,878919,879681,881320,881738,","exonEnds":"849809,851460,855511,856279,857947,858201,861157,862027,863026,863866,865171,865722,867525,868368,872360,873738,874600,876994,877263,877850,878993,879847,881471,881887,","score":0,"name2":"ITGA2","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"0,0,0,0,0,0,2,2,0,0,2,0,0,1,2,0,0,0,1,0,1,0,2,0,"},
{"bin":591,"name":"ENSACAT00000004050","chrom":"scaffold_111","strand":"-","txStart":883724,"txEnd":897808,"cdsStart":883724,"cdsEnd":897808,"exonCount":5,"exonStarts":"883724,885433,889264,889742,897701,","exonEnds":"883858,885548,889356,889852,897808,","score":0,"name2":"ENSACAG00000004086","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"1,0,1,2,0,"}
]);


That's it,

Pierre

15 May 2014

How I start a bioinformatics project

Phil Ashton tweeted a link to a paper about how to set up a bioinformatics project file hierarchy: " A Quick Guide to Organizing Computational Biology Projects ".

Nick Loman posted his version yesterday : "How I start a bioinformatics project" on http://nickloman.github.io/2014/05/14/how-i-start-a-bioinformatics-project/.

Here is mine (simplified):

  • I start by creating a directory managed by git
  • I create a JSON-based description of my data, including the path to the softwares, to the references
    {
    "reference": {
    "name": "ref",
    "fasta": "/path/to/ref.fasta"
    },
    "samples": [
    {
    "fastq": [
    "path/to/Sample1/Sample1_1.fq.gz",
    "path/to/Sample1/Sample1_2.fq.gz"
    ],
    "name": "Sample1"
    },
    {
    "fastq": [
    "path/to/Sample2/Sample2_1.fq.gz",
    "path/to/Sample2/Sample2_2.fq.gz"
    ],
    "name": "Sample2"
    },
    {
    "fastq": [
    "path/to/Sample3/Sample3_1.fq.gz",
    "path/to/Sample3/Sample3_2.fq.gz"
    ],
    "name": "Sample3"
    }
    ],
    "tools": {
    "bcftools": "/path/to/bcftools",
    "bwa": "/path/to/bwa",
    "samtools": "/path/to/samtools"
    }
    }
    view raw config.json hosted with ❤ by GitHub
  • I create a git submodule for a project hosting an Apache-velocity template transforming a Makefile from config.json :
    REF=${config.reference.fasta}
    .PHONY:all
    all: align/variants.vcf
    align/variants.vcf: #foreach($sample in ${config.samples}) align/${sample.name}_sorted.bam #end
    ${config.tools.samtools} mpileup -uf ${REF} $^ |\
    ${config.tools.bcftools} view -vcg - >$@
    #foreach($sample in ${config.samples})
    align/${sample.name}_sorted.bam : ${sample.fastq[0]} ${sample.fastq[1]}
    mkdir -p $(dir $@) && \
    ${config.tools.bwa} mem -R '@RG\tID:${sample.getId()}\tSM:${sample.name}' ${REF} $^ |\
    ${config.tools.samtools} view -b -S - |\
    ${config.tools.samtools} sort - $(basename $@) && \
    ${config.tools.samtools} index $@
    #end
    view raw make.vm hosted with ❤ by GitHub
  • The Makefile is generated using jsvelocity :
    java -jar jsvelocity.jar -f config config.json make.vm > Makefile
    view raw exectute.bash hosted with ❤ by GitHub
    It produces the following Makefile:
    REF=/path/to/ref.fasta
    .PHONY:all
    all: align/variants.vcf
    align/variants.vcf: align/Sample1_sorted.bam align/Sample2_sorted.bam align/Sample3_sorted.bam
    /path/to/samtools mpileup -uf ${REF} $^ |\
    /path/to/bcftools view -vcg - >$@
    align/Sample1_sorted.bam : path/to/Sample1/Sample1_1.fq.gz path/to/Sample1/Sample1_2.fq.gz
    mkdir -p $(dir $@) && \
    /path/to/bwa mem -R '@RG\tID:id10\tSM:Sample1' ${REF} $^ |\
    /path/to/samtools view -b -S - |\
    /path/to/samtools sort - $(basename $@) && \
    /path/to/samtools index $@
    align/Sample2_sorted.bam : path/to/Sample2/Sample2_1.fq.gz path/to/Sample2/Sample2_2.fq.gz
    mkdir -p $(dir $@) && \
    /path/to/bwa mem -R '@RG\tID:id15\tSM:Sample2' ${REF} $^ |\
    /path/to/samtools view -b -S - |\
    /path/to/samtools sort - $(basename $@) && \
    /path/to/samtools index $@
    align/Sample3_sorted.bam : path/to/Sample3/Sample3_1.fq.gz path/to/Sample3/Sample3_2.fq.gz
    mkdir -p $(dir $@) && \
    /path/to/bwa mem -R '@RG\tID:id20\tSM:Sample3' ${REF} $^ |\
    /path/to/samtools view -b -S - |\
    /path/to/samtools sort - $(basename $@) && \
    /path/to/samtools index $@
    view raw Makefile hosted with ❤ by GitHub
  • The Makefile is invoked with option -j N(Allow N jobs at once) using GNU-Make or QMake(distributed parallel make, scheduled by Sun Grid Engine)

That's it,

Pierre

12 May 2014

Generating wikipedia semantic links from a pubmed-id

In "Building a biomedical semantic network in Wikipedia with Semantic Wiki Links" (Database . 2012 Mar 20;2012) Benjamin Good & al. introduced the Semantic Wiki Link (SWL):

An SWL is a hyperlink on Wikipedia that allows the editor to explicitly specify the type of relationship between the concept described on the page being edited and the concept that is being linked to (http://en.wikipedia.org/wiki/Template:SWL). These SWLs are implemented using MediaWiki templates.
(...)
any programmer can now write computer programs to parse Wikipedia content for SWLs and import them into third-party tools (e.g. triplestores, etc.)
Example: Phospholamban:
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase ({{SWL|type=substrate_for|target=protein kinase A|label=PKA}}) in cardiac muscle.




Using Entrez-Ajax (Loman & al.) and the Wikipedia API, I wrote a HTML+JS interface to accelerate the creation of a semantic SWL wiki-text from a PUBMED-id:


and.. well, that's it,

Pierre