Showing posts with label ucsc. Show all posts
Showing posts with label ucsc. Show all posts

30 October 2014

Visualizing @GenomeBrowser liftOver/chain files using animated #SVG

I wrote a tool to visualize some UCSC "chain/liftOver" files as an animated SVG file. This tool is available on github at:

"A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.".

SVG Elements and CSS styles can be animated in a SVG file (see http://www.w3.org/TR/SVG/animate.html ) using the <animate/> element.

For example the following SVG snippet

  • defines a rectangle(x=351,y=35,width=6,height=5).
  • at t=60secs the opacity will change from 0 to 0.7 for 2secs
  • the position 'x' will move from x=351 to x=350 starting at t=62secs for 16 seconds
  • the position 'y' will move from y=35 to y=36 starting at t=62secs for 16 seconds
  • the 'width' will grow from width=6 to width=100 starting at t=62secs for 16 seconds
  • at t=78secs the opacity will change from 0.7 to 0 for 2secs
<rect x="351" y="35" width="6" height="15">
        <animate attributeType="CSS" attributeName="opacity" begin="60" dur="2" from="0" to="0.7" repeatCount="1" fill="freeze"/>
        <animate attributeType="XML" attributeName="x" begin="62" dur="16" from="351" to="350" repeatCount="1" fill="freeze"/>
        <animate attributeType="XML" attributeName="y" begin="62" dur="16" from="35" to="36" repeatCount="1" fill="freeze"/>
        <animate attributeType="XML" attributeName="width" begin="62" dur="16" from="6" to="100" repeatCount="1" fill="freeze"/>
        <animate attributeType="CSS" attributeName="opacity" begin="78" dur="2" from="0.7" to="0" repeatCount="1" fill="freeze"/>
</rect>

A demo hg16-hg17-hg18-hg19-hg38 was posted here: http://cardioserve.nantes.inserm.fr/~lindenb/liftover2svg/hg16ToHg38.svg




That's it,

Pierre.

30 September 2014

Using the Ensembl Regulatory Build to annotate some VCF files

via UCSC Genome Browser project announcements: "Data from the Ensembl Regulatory Build are now available in the UCSC Genome Browser as a public track hub for both hg19 and hg38. This track hub contains promoters and their flanking regions, enhancers, and many other regulatory features predicted across a number of cell lines using annotated segmentation states".
For example looking at chr21:33037019-33037021 returns the following screen:

Those new annotations are deployed by the Sanger Institute as a UCSC track hub. By the way, those file can be directly handled using the UCSC standalone tools:
$ bigWigSummary -type=mean -udcDir=.  \
  "http://ngs.sanger.ac.uk/production/ensembl/regulation//hg19/segmentation_summaries/Segway_17/1.bw" \
  chr1 1  110301 1

1.23587
I wrote a java tool for the annotation of VCFs with those files. This tool uses the BigWig library for java ( https://code.google.com/p/bigwig/ ) and is available at: https://github.com/lindenb/jvarkit/wiki/VcfEnsemblReg.
Here is an example with the following VCF:
##fileformat=VCFv4.1
(...)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
chr21 33037029 . C T 6.20 . . GT:PL:DP:GQ 1/1:35,3,0:1:4
VcfEnsemblReg is invoked:
$  java -jar dist/vcfensemblreg.jar in.vcf > out.vcf
Here is the content of out.vcf:
##fileformat=VCFv4.1
##INFO=<ID=AP2ALPHA,Number=1,Type=Float,Description="Overlap summary of AP2ALPHA ChipSeq binding peaks across available datasets http://ngs.sanger.ac.uk/production/ensembl/regulation//hg19/tfbs/AP2ALPHA.bw">
##INFO=<ID=AP2GAMMA,Number=1,Type=Float,Description="Overlap summary of AP2GAMMA ChipSeq binding peaks across available datasets http://ngs.sanger.ac.uk/production/ensembl/regulation//hg19/tfbs/AP2GAMMA.bw">
##INFO=<ID=ATF3,Number=1,Type=Float,Description="Overlap summary of ATF3 ChipSeq binding peaks across available datasets http://ngs.sanger.ac.uk/production/ensembl/regulation//hg19/tfbs/ATF3.bw">
##INFO=<ID=BAF155,Number=1,Type=Float,Description="Overlap summary of BAF155 ChipSeq binding peaks across available datasets http://ngs.sanger.ac.uk/production/ensembl/regulation//hg19/tfbs/BAF155.bw">
##INFO=<ID=BAF170,Number=1,Type=Float,Description="Overlap summary of BAF170 ChipSeq binding peaks across available datasets http://ngs.sanger.ac.uk/production/ensembl/regulation//hg19/tfbs/BAF170.bw">
(...)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
chr21 33037029 . C T 6.20 . BuildOverview=ctcf_45704|CTCFBindingSite;Segway_17_1=3.0;Segway_17_14=7.0;Segway_17_24=3.0;Segway_17_6=1.0;Segway_17_7=2.0;Segway_17_8=1.0;Segway_17_A549_projected=ctcf_45704|InactiveRegions;Segway_17_A549_segments=14_gene_79558|TranscriptionAssociated;Segway_17_DND41_projected=ctcf_45704|InactiveRegions;Segway_17_DND41_segments=1_distal_17115|DistalEnhancer;Segway_17_GM12878_projected=ctcf_45704|InactiveRegions;Segway_17_GM12878_segments=1_distal_29075|DistalEnhancer;Segway_17_H1HESC_projected=ctcf_45704|ActiveCTCFBindingSite;Segway_17_H1HESC_segments=8_ctcf_27831|DistalCTF;Segway_17_HELAS3_projected=ctcf_45704|InactiveRegions;Segway_17_HELAS3_segments=6_distal_76536|DistalEnhancer;Segway_17_HEPG2_projected=ctcf_45704|InactiveRegions;Segway_17_HEPG2_segments=1_distal_21535|DistalEnhancer;Segway_17_HMEC_projected=ctcf_45704|InactiveRegions;Segway_17_HMEC_segments=14_gene_44998|TranscriptionAssociated;Segway_17_HSMMT_projected=ctcf_45704|InactiveRegions;Segway_17_HSMMT_segments=24_gene_70780|TranscriptionAssociated;Segway_17_HSMM_projected=ctcf_45704|InactiveRegions;Segway_17_HSMM_segments=24_gene_80902|TranscriptionAssociated;Segway_17_HUVEC_projected=ctcf_45704|InactiveRegions;Segway_17_K562_projected=ctcf_45704|InactiveRegions;Segway_17_K562_segments=14_gene_68692|TranscriptionAssociated;Segway_17_MONO_projected=ctcf_45704|InactiveRegions;Segway_17_MONO_segments=14_gene_35200|TranscriptionAssociated;Segway_17_NHA_projected=ctcf_45704|InactiveRegions;Segway_17_NHDFAD_projected=ctcf_45704|InactiveRegions;Segway_17_NHDFAD_segments=14_gene_57366|TranscriptionAssociated;Segway_17_NHEK_projected=ctcf_45704|InactiveRegions;Segway_17_NHEK_segments=24_gene_95458|TranscriptionAssociated;Segway_17_NHLF_projected=ctcf_45704|InactiveRegions;Segway_17_NHLF_segments=14_gene_59524|TranscriptionAssociated;Segway_17_OSTEO_projected=ctcf_45704|InactiveRegions;Segway_17_OSTEO_segments=14_gene_61575|TranscriptionAssociated GT:PL:DP:GQ 1/1:35,3,0:1:4
Here are the new fields in the INFO column:
Segway_17_1 3.0
Segway_17_14 7.0
Segway_17_24 3.0
Segway_17_6 1.0
Segway_17_7 2.0
Segway_17_8 1.0
Segway_17_A549_projected ctcf_45704|InactiveRegions
Segway_17_A549_segments 14_gene_79558|TranscriptionAssociated
Segway_17_DND41_projected ctcf_45704|InactiveRegions
Segway_17_DND41_segments 1_distal_17115|DistalEnhancer
Segway_17_GM12878_projected ctcf_45704|InactiveRegions
Segway_17_GM12878_segments 1_distal_29075|DistalEnhancer
Segway_17_H1HESC_projected ctcf_45704|ActiveCTCFBindingSite
Segway_17_H1HESC_segments 8_ctcf_27831|DistalCTF
Segway_17_HELAS3_projected ctcf_45704|InactiveRegions
Segway_17_HELAS3_segments 6_distal_76536|DistalEnhancer
Segway_17_HEPG2_projected ctcf_45704|InactiveRegions
Segway_17_HEPG2_segments 1_distal_21535|DistalEnhancer
Segway_17_HMEC_projected ctcf_45704|InactiveRegions
Segway_17_HMEC_segments 14_gene_44998|TranscriptionAssociated
Segway_17_HSMMT_projected ctcf_45704|InactiveRegions
Segway_17_HSMMT_segments 24_gene_70780|TranscriptionAssociated
Segway_17_HSMM_projected ctcf_45704|InactiveRegions
Segway_17_HSMM_segments 24_gene_80902|TranscriptionAssociated
Segway_17_HUVEC_projected ctcf_45704|InactiveRegions
Segway_17_K562_projected ctcf_45704|InactiveRegions
Segway_17_K562_segments 14_gene_68692|TranscriptionAssociated
Segway_17_MONO_projected ctcf_45704|InactiveRegions
Segway_17_MONO_segments 14_gene_35200|TranscriptionAssociated
Segway_17_NHA_projected ctcf_45704|InactiveRegions
Segway_17_NHDFAD_projected ctcf_45704|InactiveRegions
Segway_17_NHDFAD_segments 14_gene_57366|TranscriptionAssociated
Segway_17_NHEK_projected ctcf_45704|InactiveRegions
Segway_17_NHEK_segments 24_gene_95458|TranscriptionAssociated
Segway_17_NHLF_projected ctcf_45704|InactiveRegions
Segway_17_NHLF_segments 14_gene_59524|TranscriptionAssociated
Segway_17_OSTEO_projected ctcf_45704|InactiveRegions
Segway_17_OSTEO_segments 14_gene_61575|TranscriptionAssociated

OK, now I've got a VCF containing those 'Ensembl Regulatory' annotations. What can I do with this ? I've currently no idea :-)

That's it,
Pierre

22 May 2014

Breaking the " same origin security policy" with CORS. An example with @GenomeBrowser / DAS.

Jerven Bolleman recently taught me about the CORS/Cross-origin resource sharing:



"Cross-origin resource sharing (CORS) is a mechanism that allows many resources (e.g. fonts, JavaScript, etc.) on a web page to be requested from another domain outside of the domain the resource originated from. In particular, JavaScript's AJAX calls can use the XMLHttpRequest mechanism. Such "cross-domain" requests would otherwise be forbidden by web browsers, per the same origin security policy."

I've created a page testing if some bioinformatics web-service support CORS. This page is available at : http://lindenb.github.io/pages/cors/index.html

Interestingly NCBI, Uniprot and UCSC support CORS. As an example, the following <form> fetches a DNA sequence using the DAS server of the UCSC and display it:



The script:



That's it
Pierre

20 May 2014

A nodejs-based REST server for the UCSC @GenomeBrowser



Node.js provides a simple mechanism to write a REST server. As an exercise, I wrote a REST server for the mysql server of the UCSC genome bowser. The code is available on github at:


Starting the server


$ cd bionode
$ node ucsc/ucsc.js
Server running at http://localhost:8080/

METHOD: /schema/databases



Lists the available databases :e.g: http://localhost:8080/schemas/databases


[
"information_schema",
"ailMel1",
"allMis1",
"anoCar1",

(...)
"visiGene",
"xenTro1",
"xenTro2",
"xenTro3"
]


This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/databases?callback=handle


handle([
"information_schema",
"ailMel1",
"allMis1",
"anoCar1",
(...)
"visiGene",
"xenTro1",
"xenTro2",
"xenTro3"
]);


METHOD: /schema/:database/tables


Lists the available tables for a given database :e.g: http://localhost:8080/schemas/anoCar1/tables


[
"all_mrna",
"author",
"blastHg18KG",
"cds",
(...)
"xenoRefFlat",
"xenoRefGene",
"xenoRefSeqAli"
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/anoCar1/tables?callback=handle


handle([
"all_mrna",
"author",
"blastHg18KG",
"cds",
"cell",
(...)
"xenoRefFlat",
"xenoRefGene",
"xenoRefSeqAli"
]);

METHOD: /schema/:database/:table


Returns a schema for the given database.table. E.g: http://localhost:8080/schemas/anoCar1/xenoMrna



{"database":"anoCar1","table":"xenoMrna","fields":[{"name":"bin","type":"smallint(5) unsigned","key":""},{"name":"matches","type":"int(10) unsigned","key":""},{"name":"misMatches","type":"int(10) unsigned","key":""},{"name":"repMatches","type":"int(10) unsigned","key":""},{"name":"nCount","type":"int(10) unsigned","key":""},{"name":"qNumInsert","type":"int(10) unsigned","key":""},{"name":"qBaseInsert","type":"int(10) unsigned","key":""},{"name":"tNumInsert","type":"int(10) unsigned","key":""},{"name":"tBaseInsert","type":"int(10) unsigned","key":""},{"name":"strand","type":"char(2)","key":""},{"name":"qName","type":"varchar(255)","key":"MUL"},{"name":"qSize","type":"int(10) unsigned","key":""},{"name":"qStart","type":"int(10) unsigned","key":""},{"name":"qEnd","type":"int(10) unsigned","key":""},{"name":"tName","type":"varchar(255)","key":"MUL"},{"name":"tSize","type":"int(10) unsigned","key":""},{"name":"tStart","type":"int(10) unsigned","key":""},{"name":"tEnd","type":"int(10) unsigned","key":""},{"name":"blockCount","type":"int(10) unsigned","key":""},{"name":"blockSizes","type":"longblob","key":""},{"name":"qStarts","type":"longblob","key":""},{"name":"tStarts","type":"longblob","key":""}]}


This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/anoCar1/xenoMrna?callback=handler


handler({"database":"anoCar1","table":"xenoMrna","fields":[{"name":"bin","type":"smallint(5) unsigned","key":""},{"name":"matches","type":"int(10) unsigned","key":""},{"name":"misMatches","type":"int(10) unsigned","key":""},{"name":"repMatches","type":"int(10) unsigned","key":""},{"name":"nCount","type":"int(10) unsigned","key":""},{"name":"qNumInsert","type":"int(10) unsigned","key":""},{"name":"qBaseInsert","type":"int(10) unsigned","key":""},{"name":"tNumInsert","type":"int(10) unsigned","key":""},{"name":"tBaseInsert","type":"int(10) unsigned","key":""},{"name":"strand","type":"char(2)","key":""},{"name":"qName","type":"varchar(255)","key":"MUL"},{"name":"qSize","type":"int(10) unsigned","key":""},{"name":"qStart","type":"int(10) unsigned","key":""},{"name":"qEnd","type":"int(10) unsigned","key":""},{"name":"tName","type":"varchar(255)","key":"MUL"},{"name":"tSize","type":"int(10) unsigned","key":""},{"name":"tStart","type":"int(10) unsigned","key":""},{"name":"tEnd","type":"int(10) unsigned","key":""},{"name":"blockCount","type":"int(10) unsigned","key":""},{"name":"blockSizes","type":"longblob","key":""},{"name":"qStarts","type":"longblob","key":""},{"name":"tStarts","type":"longblob","key":""}]});


METHOD: /ucsc/:database/:table/:column/:key


Fetch the rows for a given database.name having a :column==:key . The :column must be indexed. E.g: http://localhost:8080/ucsc/anoCar1/ensGene/name/ENSACAT00000004346


[
{"bin":592,"name":"ENSACAT00000004346","chrom":"scaffold_111","strand":"-","txStart":991522,"txEnd":996396,"cdsStart":991522,"cdsEnd":996396,"exonCount":3,"exonStarts":"991522,995669,995976,","exonEnds":"991954,995972,996396,","score":0,"name2":"PELO","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,0,"}
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/ucsc/anoCar1/ensGene/name/ENSACAT00000004346?callback=handler


handler([
{"bin":592,"name":"ENSACAT00000004346","chrom":"scaffold_111","strand":"-","txStart":991522,"txEnd":996396,"cdsStart":991522,"cdsEnd":996396,"exonCount":3,"exonStarts":"991522,995669,995976,","exonEnds":"991954,995972,996396,","score":0,"name2":"PELO","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,0,"}
]);

METHOD: /ucsc/:database/:table?chrom=?&start=?&end=?




Fetch the rows for a given genomic database.name overlapping the given range. This method uses the UCSC-bin index if it is available; E.g: http://localhost:8080/ucsc/anoCar1/ensGene?chrom=scaffold_111&start=600000&end=900000


[
{"bin":589,"name":"ENSACAT00000003906","chrom":"scaffold_111","strand":"-","txStart":594783,"txEnd":614216,"cdsStart":595000,"cdsEnd":614201,"exonCount":9,"exonStarts":"594783,601291,601744,603640,604745,604865,609139,611740,614097,","exonEnds":"595105,601406,601813,603736,604771,604942,609173,611840,614216,","score":0,"name2":"DPM1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,2,2,2,0,1,0,2,0,"},
{"bin":589,"name":"ENSACAT00000003908","chrom":"scaffold_111","strand":"+","txStart":614382,"txEnd":615600,"cdsStart":614382,"cdsEnd":615600,"exonCount":1,"exonStarts":"614382,","exonEnds":"615600,","score":0,"name2":"MOCS3","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,"},
{"bin":589,"name":"ENSACAT00000003918","chrom":"scaffold_111","strand":"-","txStart":638920,"txEnd":642127,"cdsStart":638920,"cdsEnd":642127,"exonCount":2,"exonStarts":"638920,641368,","exonEnds":"639691,642127,","score":0,"name2":"KCNG1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,"},
{"bin":591,"name":"ENSACAT00000003920","chrom":"scaffold_111","strand":"+","txStart":814576,"txEnd":826972,"cdsStart":814576,"cdsEnd":826972,"exonCount":3,"exonStarts":"814576,825125,826845,","exonEnds":"814594,825247,826972,","score":0,"name2":"ENSACAG00000003945","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,0,2,"},
{"bin":591,"name":"ENSACAT00000004042","chrom":"scaffold_111","strand":"-","txStart":849731,"txEnd":881887,"cdsStart":849731,"cdsEnd":881887,"exonCount":24,"exonStarts":"849731,851343,855421,856165,857842,858090,861054,861943,862949,863773,865029,865639,867414,868216,872220,873601,874396,876850,877105,877711,878919,879681,881320,881738,","exonEnds":"849809,851460,855511,856279,857947,858201,861157,862027,863026,863866,865171,865722,867525,868368,872360,873738,874600,876994,877263,877850,878993,879847,881471,881887,","score":0,"name2":"ITGA2","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"0,0,0,0,0,0,2,2,0,0,2,0,0,1,2,0,0,0,1,0,1,0,2,0,"},
{"bin":591,"name":"ENSACAT00000004050","chrom":"scaffold_111","strand":"-","txStart":883724,"txEnd":897808,"cdsStart":883724,"cdsEnd":897808,"exonCount":5,"exonStarts":"883724,885433,889264,889742,897701,","exonEnds":"883858,885548,889356,889852,897808,","score":0,"name2":"ENSACAG00000004086","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"1,0,1,2,0,"}
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/ucsc/anoCar1/ensGene?chrom=scaffold_111&start=600000&end=900000&callback=handler


handler([
{"bin":589,"name":"ENSACAT00000003906","chrom":"scaffold_111","strand":"-","txStart":594783,"txEnd":614216,"cdsStart":595000,"cdsEnd":614201,"exonCount":9,"exonStarts":"594783,601291,601744,603640,604745,604865,609139,611740,614097,","exonEnds":"595105,601406,601813,603736,604771,604942,609173,611840,614216,","score":0,"name2":"DPM1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,2,2,2,0,1,0,2,0,"},
{"bin":589,"name":"ENSACAT00000003908","chrom":"scaffold_111","strand":"+","txStart":614382,"txEnd":615600,"cdsStart":614382,"cdsEnd":615600,"exonCount":1,"exonStarts":"614382,","exonEnds":"615600,","score":0,"name2":"MOCS3","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,"},
{"bin":589,"name":"ENSACAT00000003918","chrom":"scaffold_111","strand":"-","txStart":638920,"txEnd":642127,"cdsStart":638920,"cdsEnd":642127,"exonCount":2,"exonStarts":"638920,641368,","exonEnds":"639691,642127,","score":0,"name2":"KCNG1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,"},
{"bin":591,"name":"ENSACAT00000003920","chrom":"scaffold_111","strand":"+","txStart":814576,"txEnd":826972,"cdsStart":814576,"cdsEnd":826972,"exonCount":3,"exonStarts":"814576,825125,826845,","exonEnds":"814594,825247,826972,","score":0,"name2":"ENSACAG00000003945","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,0,2,"},
{"bin":591,"name":"ENSACAT00000004042","chrom":"scaffold_111","strand":"-","txStart":849731,"txEnd":881887,"cdsStart":849731,"cdsEnd":881887,"exonCount":24,"exonStarts":"849731,851343,855421,856165,857842,858090,861054,861943,862949,863773,865029,865639,867414,868216,872220,873601,874396,876850,877105,877711,878919,879681,881320,881738,","exonEnds":"849809,851460,855511,856279,857947,858201,861157,862027,863026,863866,865171,865722,867525,868368,872360,873738,874600,876994,877263,877850,878993,879847,881471,881887,","score":0,"name2":"ITGA2","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"0,0,0,0,0,0,2,2,0,0,2,0,0,1,2,0,0,0,1,0,1,0,2,0,"},
{"bin":591,"name":"ENSACAT00000004050","chrom":"scaffold_111","strand":"-","txStart":883724,"txEnd":897808,"cdsStart":883724,"cdsEnd":897808,"exonCount":5,"exonStarts":"883724,885433,889264,889742,897701,","exonEnds":"883858,885548,889356,889852,897808,","score":0,"name2":"ENSACAG00000004086","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"1,0,1,2,0,"}
]);


That's it,

Pierre

30 January 2014

Parallelizing #RStats using #make

In the current post, I'll show how to use R as the main SHELL of GNU-Make instead of using a classical linux shell like 'bash'. Why would you do this ?

  • awesomeness
  • Make-based workflow management
  • Make-based execution with --jobs. GNU make knows how to execute several recipes at once. Normally, make will execute only one recipe at a time, waiting for it to finish before executing the next. However, the '-j' or '--jobs' option tells make to execute many recipes simultaneously.
The following recipe has been tested with GNU-Make 4.0 and I'm not sure it would world with '<=3.81'.

The only problem is that R doesn't accept a multiline-argument on the command line (see http://stackoverflow.com/questions/21442674) so I created a wrapper 'mockR' that save the argument '-e "code"' into a file and pipe it into R:

(Edit1: A comment from madscientist : Re your script; you can save yourself some wear-and-tear on your disk and avoid the need for temp files and cleanup by just piping the input directly: echo "$R" | R --vanilla --no-readline --quiet . Just a thought. ")

(Edit2: the exit value of 'R' should also be returned by 'mockR'.)

This file is set as executable:
$ chmod u+x ./mockR
In the makefile, we tell 'make' to use 'mockR' instead of '/usr/bin/sh':
SHELL = ./mockR
The R code will be passed to 'mockR' using the argument '-e "code"'
.SHELLFLAGS= -e
We also set 'ONESHELL': "If .ONESHELL is mentioned as a target, then when a target is built all lines of the recipe will be given to a single invocation of the shell rather than each line being invoked separately"
.ONESHELL:

Example 1

We download the table 'knownGene' from the UCSC and we plot a pdf file 'countExons=f(txStart)'. Please, note that the targets are created using some R statements, NOT bash statements:

Now Invoke make


Example 2

Using a the eval and the call function we can make the previous 'Makefile' applicable for all the chromosomes:

Now Invoke make USING TRHEE PARALLEL JOBS





You can now watch the final pdf files:




That's it,
Pierre

Mapping the UCSC/Web-Sequences to a world map.

People at the UCSC have recently released a new track for the GenomeBrowser


"We're pleased to announce the release of the Web Sequences track on the UCSC Genome Browser. This track, produced in collaboration with Microsoft Research, contains the results of a 30-day scan for DNA sequences from over 40 billion different webpages. The sequences were then mapped with Blat to the human genome (...) The data were extracted from a variety of sources including patents, online textbooks, help forums, and any other webpages that contain DNA sequence. In essence, this track displays the Blat alignments of nearly every DNA sequence on the internet!"

I've mapped each genomic location from this track to a country and generated the following (unreadable) picture:

How this picture was generated

  • I've downloaded the data from the UCSC using the Table browser. The data look like this:
    #bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        reserved        blockCount      blockSizes      chromStarts     tSeqTypes     seqIds  seqRanges       publisher       pmid    doi     issn    journal title   firstAuthor     year    impact  classes locus
    585     chr1    14789   15004   3500336380      75              14789   15004   8421504 2       40,35   0,180   g       350033638000000000      0-75                          Tophat, Cufflinks and replicates - Page 2 - SEQanswers  seqanswers.com  0       0               WASH2P,WASH7P
    585     chr1    15017   15590   3500327042      381             15017   15590   8421504 2       326,55  0,518   g       350032704200000008      0-747                        Research Technologies at Indiana University      biomedapp.iu.edu        0       0               WASH7P
    585     chr1    68858   68895   3500020489      37              68858   68895   8421504 1       37      0       g       350002048900000000,350002048900000001   0-36,0-36    Genome mapability - Musings from a PhD candidate davetang.org    0       0               OR4F5
    585     chr1    69170   69479   3500359797      142             69170   69479   8421504 2       76,66   0,243   c       350035979700000000,350035979700000002   0-76,10-76    CRAM compression and TLEN SAM's field - SEQanswers      seqanswers.com  0       0               OR4F5
    585     chr1    70013   70230   3500427570      150             70013   70230   8421504 2       75,75   0,142   g       350042757000000000,350042757000000001   0-75,0-75     Inconsistency with SAM flag output? - SEQanswers        seqanswers.com  0       0               OR4F5
    585     chr1    98860   98888   3500207083      26              98860   98888   8421504 3       5,7,14  0,6,14  g       350020708300000108,350020708300000060,350020708300000239      0-24,0-21,0-21                                          Method For The Simultaneous Determination Of Blood Group And Platelet Antigen Genotypes         .freshpatents.com     0       0               OR4F5
    586     chr1    137603  138008  3500170315      405             137603  138008  8421504 1       405     0       p       350017031500015076,350017031500015074   0-135,0-270  Balding D. (2007) Handbook of Statistical Genetics       www.scribd.com  0       0               OR4F5
    586     chr1    139485  143008  3500419332      1794            139485  143008  8421504 2       65,1729 0,1794  g       350041933200000004,350041933200000000,350041933200000001,350041933200000002,350041933200000003        0-1263,0-1859,0-1852,0-1860,0-576                                               PPT  Evolution by Genome Duplication PowerPoint presentation | free to view   www.powershow.com       0       0               OR4F5
    586     chr1    141535  143008  3500270480      1372            141535  143008  8421504 24      57,60,58,59,61,59,60,59,59,62,61,58,60,58,16,59,59,59,59,57,57,59,58,58 0,61,125,187,250,314,377,441,503,566,631,695,756,819,881,919,981,1044,1107,1170,1230,1291,1353,1415   g       350027048000000003,350027048000000002   0-902,0-525                  Chen-Kung Chou 3-22-2004 www.dls.ym.edu.tw       0       0               OR4F5
  • I want to generate BED file: 'chrom/start/end/country'. The 23rd column contains the URL of the web-sequence. I use the domain of the URL to try to guess the country. The following awk script was used to generate the file:
    BEGIN   {
            FS="[\t]";
            }
    
            {
            country=$23;
            for(;;)
                    {
                    slash=index(country,"/");
                    if(slash==0) break;
                    country=substr(country,1,slash-1);
                    }
            for(;;)
                    {
                    colon=index(country,":");
                    if(colon==0) break;
                    country=substr(country,1,colon-1);
                    }
            if( country ~ /\.$/ ) next;
            if( country ~ /\.com$/ ) next;
            if( country ~ /\.org$/ ) next;
            if( country ~ /\.cat$/ ) next;
            if( country ~ /\.net$/ ) next;
            if( country ~ /\.gov$/ ) next;
            if( country ~ /\.edu$/ ) next;
            if( country ~ /\.name$/ ) next;
            if( country ~ /\.info$/ ) next;
            if( country ~ /\.biz$/ ) next;
            if( country ~ /\.[0-9]+$/ ) next;
            if( index(country,".")==0) next;
            if( index(country," ")!=0) next;
            for(;;)
                    {
                    dot=index(country,".");
                    if(dot==0) break;
                    country=substr(country,dot+1);
                    }
    
                    if( country== "af") {country="afghanistan";}
                    else if( country== "ax") {country="Ålandislands";}
                    else if( country== "al") {country="albania";}
                    else if( country== "dz") {country="algeria";}
                    else if( country== "as") {country="americansamoa";}
                    else if( country== "ad") {country="andorra";}
                    else if( country== "ao") {country="angola";}
                    else if( country== "ai") {country="anguilla";}
                    else if( country== "aq") {country="antarctica";}
                    else if( country== "ag") {country="antiguaandbarbuda";}
                    else if( country== "ar") {country="argentina";}
                    else if( country== "am") {country="armenia";}
                    else if( country== "aw") {country="aruba";}
                    else if( country== "au") {country="australia";}
                    else if( country== "at") {country="austria";}
                    else if( country== "az") {country="azerbaijan";}
                    else if( country== "bs") {country="bahamas";}
                    else if( country== "bh") {country="bahrain";}
                    else if( country== "bd") {country="bangladesh";}
                    else if( country== "bb") {country="barbados";}
                    else if( country== "by") {country="belarus";}
                    else if( country== "be") {country="belgium";}
                    else if( country== "bz") {country="belize";}
                    else if( country== "bj") {country="benin";}
                    else if( country== "bm") {country="bermuda";}
                    else if( country== "bt") {country="bhutan";}
                    else if( country== "bo") {country="bolivia,plurinationalstateof";}
                    else if( country== "bq") {country="bonaire,sinteustatiusandsaba";}
                    else if( country== "ba") {country="bosniaandherzegovina";}
                    else if( country== "bw") {country="botswana";}
                    else if( country== "bv") {country="bouvetisland";}
                    else if( country== "br") {country="brazil";}
                    else if( country== "io") {country="britishindianoceanterritory";}
                    else if( country== "bn") {country="bruneidarussalam";}
                    else if( country== "bg") {country="bulgaria";}
                    else if( country== "bf") {country="burkinafaso";}
                    else if( country== "bi") {country="burundi";}
                    else if( country== "kh") {country="cambodia";}
                    else if( country== "cm") {country="cameroon";}
                    else if( country== "ca") {country="canada";}
                    else if( country== "cv") {country="capeverde";}
                    else if( country== "ky") {country="caymanislands";}
                    else if( country== "cf") {country="centralafricanrepublic";}
                    else if( country== "td") {country="chad";}
                    else if( country== "cl") {country="chile";}
                    else if( country== "cn") {country="china";}
                    else if( country== "cx") {country="christmasisland";}
                    else if( country== "cc") {country="cocos(keeling)islands";}
                    else if( country== "co") {country="colombia";}
                    else if( country== "km") {country="comoros";}
                    else if( country== "cg") {country="congo";}
                    else if( country== "cd") {country="congo,thedemocraticrepublicofthe";}
                    else if( country== "ck") {country="cookislands";}
                    else if( country== "cr") {country="costarica";}
                    else if( country== "ci") {country="cÔted'ivoire";}
                    else if( country== "hr") {country="croatia";}
                    else if( country== "cu") {country="cuba";}
                    else if( country== "cw") {country="curaÇao";}
                    else if( country== "cy") {country="cyprus";}
                    else if( country== "cz") {country="czechrepublic";}
                    else if( country== "dk") {country="denmark";}
                    else if( country== "dj") {country="djibouti";}
                    else if( country== "dm") {country="dominica";}
                    else if( country== "do") {country="dominicanrepublic";}
                    else if( country== "ec") {country="ecuador";}
                    else if( country== "eg") {country="egypt";}
                    else if( country== "sv") {country="elsalvador";}
                    else if( country== "gq") {country="equatorialguinea";}
                    else if( country== "er") {country="eritrea";}
                    else if( country== "ee") {country="estonia";}
                    else if( country== "et") {country="ethiopia";}
                    else if( country== "fk") {country="falklandislands(malvinas)";}
                    else if( country== "fo") {country="faroeislands";}
                    else if( country== "fj") {country="fiji";}
                    else if( country== "fi") {country="finland";}
                    else if( country== "fr") {country="france";}
                    else if( country== "gf") {country="frenchguiana";}
                    else if( country== "pf") {country="frenchpolynesia";}
                    else if( country== "tf") {country="frenchsouthernterritories";}
                    else if( country== "ga") {country="gabon";}
                    else if( country== "gm") {country="gambia";}
                    else if( country== "ge") {country="georgia";}
                    else if( country== "de") {country="germany";}
                    else if( country== "gh") {country="ghana";}
                    else if( country== "gi") {country="gibraltar";}
                    else if( country== "gr") {country="greece";}
                    else if( country== "gl") {country="greenland";}
                    else if( country== "gd") {country="grenada";}
                    else if( country== "gp") {country="guadeloupe";}
                    else if( country== "gu") {country="guam";}
                    else if( country== "gt") {country="guatemala";}
                    else if( country== "gg") {country="guernsey";}
                    else if( country== "gn") {country="guinea";}
                    else if( country== "gw") {country="guinea-bissau";}
                    else if( country== "gy") {country="guyana";}
                    else if( country== "ht") {country="haiti";}
                    else if( country== "hm") {country="heardislandandmcdonaldislands";}
                    else if( country== "va") {country="holysee(vaticancitystate)";}
                    else if( country== "hn") {country="honduras";}
                    else if( country== "hk") {country="china";}
                    else if( country== "hu") {country="hungary";}
                    else if( country== "is") {country="iceland";}
                    else if( country== "in") {country="india";}
                    else if( country== "id") {country="indonesia";}
                    else if( country== "ir") {country="iran";}
                    else if( country== "iq") {country="iraq";}
                    else if( country== "ie") {country="ireland";}
                    else if( country== "im") {country="isleofman";}
                    else if( country== "il") {country="israel";}
                    else if( country== "it") {country="italy";}
                    else if( country== "jm") {country="jamaica";}
                    else if( country== "jp") {country="japan";}
                    else if( country== "je") {country="jersey";}
                    else if( country== "jo") {country="jordan";}
                    else if( country== "kz") {country="kazakhstan";}
                    else if( country== "ke") {country="kenya";}
                    else if( country== "ki") {country="kiribati";}
                    else if( country== "kp") {country="northkorea";}
                    else if( country== "kr") {country="southkorea";}
                    else if( country== "kw") {country="kuwait";}
                    else if( country== "kg") {country="kyrgyzstan";}
                    else if( country== "la") {country="laopeople'sdemocraticrepublic";}
                    else if( country== "lv") {country="latvia";}
                    else if( country== "lb") {country="lebanon";}
                    else if( country== "ls") {country="lesotho";}
                    else if( country== "lr") {country="liberia";}
                    else if( country== "ly") {country="libya";}
                    else if( country== "li") {country="liechtenstein";}
                    else if( country== "lt") {country="lithuania";}
                    else if( country== "lu") {country="luxembourg";}
                    else if( country== "mo") {country="macao";}
                    else if( country== "mk") {country="macedonia,theformeryugoslavrepublicof";}
                    else if( country== "mg") {country="madagascar";}
                    else if( country== "mw") {country="malawi";}
                    else if( country== "my") {country="malaysia";}
                    else if( country== "mv") {country="maldives";}
                    else if( country== "ml") {country="mali";}
                    else if( country== "mt") {country="malta";}
                    else if( country== "mh") {country="marshallislands";}
                    else if( country== "mq") {country="martinique";}
                    else if( country== "mr") {country="mauritania";}
                    else if( country== "mu") {country="mauritius";}
                    else if( country== "yt") {country="mayotte";}
                    else if( country== "mx") {country="mexico";}
                    else if( country== "fm") {country="micronesia,federatedstatesof";}
                    else if( country== "md") {country="moldova,republicof";}
                    else if( country== "mc") {country="monaco";}
                    else if( country== "mn") {country="mongolia";}
                    else if( country== "me") {country="montenegro";}
                    else if( country== "ms") {country="montserrat";}
                    else if( country== "ma") {country="morocco";}
                    else if( country== "mz") {country="mozambique";}
                    else if( country== "mm") {country="myanmar";}
                    else if( country== "na") {country="namibia";}
                    else if( country== "nr") {country="nauru";}
                    else if( country== "np") {country="nepal";}
                    else if( country== "nl") {country="netherlands";}
                    else if( country== "nc") {country="newcaledonia";}
                    else if( country== "nz") {country="newzealand";}
                    else if( country== "ni") {country="nicaragua";}
                    else if( country== "ne") {country="niger";}
                    else if( country== "ng") {country="nigeria";}
                    else if( country== "nu") {country="niue";}
                    else if( country== "nf") {country="norfolkisland";}
                    else if( country== "mp") {country="northernmarianaislands";}
                    else if( country== "no") {country="norway";}
                    else if( country== "om") {country="oman";}
                    else if( country== "pk") {country="pakistan";}
                    else if( country== "pw") {country="palau";}
                    else if( country== "ps") {country="palestine,stateof";}
                    else if( country== "pa") {country="panama";}
                    else if( country== "pg") {country="papuanewguinea";}
                    else if( country== "py") {country="paraguay";}
                    else if( country== "pe") {country="peru";}
                    else if( country== "ph") {country="philippines";}
                    else if( country== "pn") {country="pitcairn";}
                    else if( country== "pl") {country="poland";}
                    else if( country== "pt") {country="portugal";}
                    else if( country== "pr") {country="puertorico";}
                    else if( country== "qa") {country="qatar";}
                    else if( country== "re") {country="france";}
                    else if( country== "ro") {country="romania";}
                    else if( country== "ru") {country="russia";}
                    else if( country== "rw") {country="rwanda";}
                    else if( country== "bl") {country="saintbarthÉlemy";}
                    else if( country== "sh") {country="sainthelena,ascensionandtristandacunha";}
                    else if( country== "kn") {country="saintkittsandnevis";}
                    else if( country== "lc") {country="saintlucia";}
                    else if( country== "mf") {country="saintmartin(frenchpart)";}
                    else if( country== "pm") {country="saintpierreandmiquelon";}
                    else if( country== "vc") {country="saintvincentandthegrenadines";}
                    else if( country== "ws") {country="samoa";}
                    else if( country== "sm") {country="sanmarino";}
                    else if( country== "st") {country="saotomeandprincipe";}
                    else if( country== "sa") {country="saudiarabia";}
                    else if( country== "sn") {country="senegal";}
                    else if( country== "rs") {country="serbia";}
                    else if( country== "sc") {country="seychelles";}
                    else if( country== "sl") {country="sierraleone";}
                    else if( country== "sg") {country="singapore";}
                    else if( country== "sx") {country="sintmaarten(dutchpart)";}
                    else if( country== "sk") {country="slovakia";}
                    else if( country== "si") {country="slovenia";}
                    else if( country== "sb") {country="solomonislands";}
                    else if( country== "so") {country="somalia";}
                    else if( country== "za") {country="southafrica";}
                    else if( country== "gs") {country="southgeorgiaandthesouthsandwichislands";}
                    else if( country== "ss") {country="southsudan";}
                    else if( country== "es") {country="spain";}
                    else if( country== "lk") {country="srilanka";}
                    else if( country== "sd") {country="sudan";}
                    else if( country== "sr") {country="suriname";}
                    else if( country== "sj") {country="svalbardandjanmayen";}
                    else if( country== "sz") {country="swaziland";}
                    else if( country== "se") {country="sweden";}
                    else if( country== "ch") {country="switzerland";}
                    else if( country== "sy") {country="syrianarabrepublic";}
                    else if( country== "tw") {country="taiwan";}
                    else if( country== "tj") {country="tajikistan";}
                    else if( country== "tz") {country="tanzania";}
                    else if( country== "th") {country="thailand";}
                    else if( country== "tl") {country="timor-leste";}
                    else if( country== "tg") {country="togo";}
                    else if( country== "tk") {country="tokelau";}
                    else if( country== "to") {country="tonga";}
                    else if( country== "tt") {country="trinidadandtobago";}
                    else if( country== "tn") {country="tunisia";}
                    else if( country== "tr") {country="turkey";}
                    else if( country== "tm") {country="turkmenistan";}
                    else if( country== "tc") {country="turksandcaicosislands";}
                    else if( country== "tv") {country="tuvalu";}
                    else if( country== "ug") {country="uganda";}
                    else if( country== "ua") {country="ukraine";}
                    else if( country== "ae") {country="unitedarabemirates";}
                    else if( country== "gb") {country="unitedkingdom";}
                    else if( country== "uk") {country="unitedkingdom";}
                    else if( country== "us") {country="USA";}
                    else if( country== "um") {country="unitedstatesminoroutlyingislands";}
                    else if( country== "uy") {country="uruguay";}
                    else if( country== "uz") {country="uzbekistan";}
                    else if( country== "vu") {country="vanuatu";}
                    else if( country== "ve") {country="venezuela";}
                    else if( country== "vn") {country="vietnam";}
                    else if( country== "vg") {country="virginislands,british";}
                    else if( country== "vi") {country="virginislands,u.s.";}
                    else if( country== "wf") {country="wallisandfutuna";}
                    else if( country== "eh") {country="westernsahara";}
                    else if( country== "ye") {country="yemen";}
                    else if( country== "zm") {country="zambia";}
                    else if( country== "zw") {country="zimbabwe";}
                    else { next;}
    
            printf("%s\t%s\t%s\t%s\n",$2,$3,$4,country);
            }
    
  • For the world map, I've used a SVG-vectorial map from wikipedia: https://commons.wikimedia.org/wiki/File:World_V2.0.svg.

    The coordinates of the boundaries of each country are defined in a SVG 'path' element:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/
    <svg xmlns="http://www.w3.org/2000/svg" width="8.88889in" height="4.44444in" viewBox="0 0 800 400">
      <path id="Taiwan" fill="none" stroke="black" stroke-width="1" d="M 668.85,151.22  C 668.98,150.71 ...
      <path id="Estonia" fill="none" stroke="black" stroke-width="1" d="M 460.75,68.26  C 459.95,68.11 4 ...
      <path id="Latvia" fill="none" stroke="black" stroke-width="1" d="M 461.23,72.27  C 460.75,72.20 46 ...
      <path id="Lithuania" fill="none" stroke="black" stroke-width="1" d="M 452.39,79.42  C 452.67,79.72 ...
      <path id="Byelarus" fill="none" stroke="black" stroke-width="1" d="M 453.57,81.92  C 453.87,82.37  ...
      <path id="Ukraine" fill="none" stroke="black" stroke-width="1" d="M 453.09,85.95  C 453.43,86.30 4 ...
      <path id="Moldova" fill="none" stroke="black" stroke-width="1" d="M 460.57,93.00  C 461.00,93.70 4 ...
      <path id="Syria" fill="none" stroke="black" stroke-width="1" d="M 480.33,127.61  C 481.03,127.90 4 ...
      <path id="Turkey" fill="none" stroke="black" stroke-width="1" d="M 499.47,116.91  C 499.31,116.32  ...
      <path id="Kuwait" fill="none" stroke="black" stroke-width="1" d="M 505.26,133.84  C 504.97,134.56  ...
      <path id="Saudi Arabia" fill="none" stroke="black" stroke-width="1" d="M 495.83,163.75  C 496.31,1 ...
      <path id="United Arab Emirates" fill="none" stroke="black" stroke-width="1" d="M 516.03,150.12  C  ...
      <path id="Yemen" fill="none" stroke="black" stroke-width="1" d="M 517.68,161.79  C 517.01,160.50 5 ...
      <path id="Slovenia" fill="none" stroke="black" stroke-width="1" d="M 430.55,97.07  C 429.62,97.36  ...
      <path id="Croatia" fill="none" stroke="black" stroke-width="1" d="M 439.09,103.97  C 439.01,103.46 ...
      <path id="Bosnia and Herzegovina" fill="none" stroke="black" stroke-width="1" d="M 440.44,105.02   ...
    (...)
  • I've joined the data using a custom java program (available on github at: https://github.com/lindenb/jvarkit/wiki/WorldMapGenome ). The program transforms the 'path' elements to a GeneralPath
    $  cat map.bed |\
         java -jar dist/worldmapgenome.jar \
         -u World_V2.0.svg \
         -w 2000 -o ~/ouput.jpg \
         -R hg19.fasta
That's it,
Pierre