PostScript as a Programming Language for Bioinformatics: mynotebook
"PostScript (PS) is an interpreted, stack-based programming language. It is best known for its use as a page description language in the electronic and desktop publishing areas."[wikipedia]. In this post, I'll show how I've used to create a simple and lightweight view
of the genome.
Introduction: just a simple postscript program
The following PS program fills a rectangular gray shape; You can display the result using ghostview, a2ps, etc...%!PS newpath 50 50 moveto 0 100 rlineto 100 0 rlineto 0 -100 rlineto closepath 0.5 setgray fill showpage
Some global variables
The page width
/screenWidth 1000 def
The page width
/screenHeight 1000 def
The minimum 5' position
/minChromStart 1E9 def
The maximum 3' position
/maxChromEnd -1 def
The size of a genomic feature
/featureHeight 20 def
The distance between two 'ticks' for drawing the orientation
/ticksx 20 def
The font size
/theFontSize 9 defThe variable knownGene is a PS array of genes.
/knownGene [ [(uc002zkr.3) (chr22) (-) 161242... ...] ] def
Each Gene is a PS array holding the structure of the UCSC knownGene table, that is to say: name , chromosome, txStart, txEnd, cdsStart, cdsEnd, exonStarts, exonEnds:
[(uc002zmh.2) (chr22) (-) 17618410 17646177 17618910 17646134 [17618410 17619439 17621948 17623987 17625913 17629337 17630431 17646098 ] [17619247 17619628 17622123 17624021 17626007 17629450 17630635 17646177 ] ]. a simple command line can be used to fetch those data:
% curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" |\ gunzip -c | grep chr22 | head -n 20 |\ awk '{printf("[(%s) (%s) (%s) %s %s %s %s [%s] [%s] ]\n",$1,$2,$3,$4,$5,$6,$7,$9,$10);}' |\ tr "," " " > result.txt
Some utilities
converting a PS object to string
/toString { 20 string cvs } bind def
Converting a string to interger (loop over each character an increase the current value)
/toInteger { 3 dict begin /s exch def /i 0 def /n 0 def s { n 10 mul /n exch def s i get 48 sub n add /n exch def %48=ascii('0') i 1 add /i exch def } forall n % leave n on the stack end } bind def
Convert a genomic position to a index on the page 'x' axis
/convertPos2pixel { minChromStart sub maxChromEnd minChromStart sub div screenWidth mul } bind def
Extract the chromosome (that is to say, extract the 1st element of the current array on the stack)
/getChrom { 1 get } bind def
Create a hyperlink to the UCSC genome browser
/getHyperLink { 3 dict begin /E exch def %% END /S exch def %% START /C exch def %% CHROMOSOME [ (http://genome.ucsc.edu/cgi-bin/hgTracks?position=) C (:) S toString (-) E toString (&) (&db=hg19) ] concatstringarray end } bind def
Paint a rectangle
/box { 4 dict begin /height exch def /width exch def /y exch def /x exch def x y moveto width 0 rlineto 0 height rlineto width -1 mul 0 rlineto 0 height -1 mul rlineto end } bind def
Paint a gray gradient
/gradient { 4 dict begin /height exch def /width exch def /y exch def /x exch def /i 0 def height 2 div /i exch def 0 1 height 2 div { 1 i height 2.0 div div sub setgray newpath x y height 2 div i sub add width i 2 mul box closepath fill i 1 sub /i exch def }for newpath 0 setgray 0.4 setlinewidth x y width height box closepath stroke end } bind def
Methods extracting a data about the current gene on the PS stack.
Extract the transcription start:/getTxStart { 3 get } bind def
Extract the transcription end:
/getTxEnd { 4 get } bind def
Extract the CDS start:
/getCdsStart { 5 get } bind def
Extract the transcription end:
/getCdsEnd { 6 get } bind def
Extract the strand:
/getStrand { 2 get (+) eq {1} {-1} ifelse } bind defGet the gene name
/getKgName { 0 get } bind def
Get the number of exons:
/getExonCount { 7 get length } bind def
Get the start position of the i-th exon:
/getExonStart { 2 dict begin /i exch def /gene exch def gene 7 get i get end } bind def
Get the end position of the i-th exon:
/getExonEnd { 2 dict begin /i exch def /gene exch def gene 8 get i get end } bind def
Should we draw this gene on the page ?
/isVisible { 1 dict begin /gene exch def minChromStart gene getTxEnd gt { false } { gene getTxStart maxChromEnd gt { false } { true }ifelse }ifelse end }bind def
Methods for an array of genes
Loop over the genes and extract the lowest 5' index:/getMinChromStart { 3 dict begin /genes exch def /pos 10E9 def /i 0 def genes length { genes i get getTxStart pos min /pos exch def i 1 add /i exch def }repeat pos end } bind def
Loop over the genes and extract the highest 3' index:
/getMaxChromEnd { 3 dict begin /genes exch def /pos -1E9 def /i 0 def genes length { genes i get getTxEnd pos max /pos exch def i 1 add /i exch def }repeat pos end } bind def
Painting ONE Gene
/paintGene { 5 dict begin /gene exch def %% the GENE argument /midy featureHeight 2.0 div def %the middle of the row /x0 gene getTxStart convertPos2pixel def % 5' side of the gene in pixel /x1 gene getTxEnd convertPos2pixel def % 3' side of the gene in pixel /i 0 def 0.1 setlinewidth 1 0 0 setrgbcolor newpath x0 midy moveto x1 midy lineto closepath stroke % paint ticks 0 1 x1 x0 sub ticksx div{ newpath gene getStrand 1 eq { x0 ticksHeight sub i add midy ticksHeight add moveto x0 i add midy lineto x0 ticksHeight sub i add midy ticksHeight sub lineto } %else { x0 ticksHeight add i add midy ticksHeight add moveto x0 i add midy lineto x0 ticksHeight add i add midy ticksHeight sub lineto } ifelse stroke i ticksx add /i exch def } for %paint Transcript start-end 0 0 1 setrgbcolor newpath gene getCdsStart convertPos2pixel midy cdsHeight 2 div sub gene getCdsEnd convertPos2pixel gene getCdsStart convertPos2pixel sub cdsHeight box closepath fill % loop over exons 0 /i exch def gene getExonCount { gene i getExonStart convertPos2pixel midy exonHeight 2 div sub gene i getExonEnd convertPos2pixel gene i getExonStart convertPos2pixel sub exonHeight gradient i 1 add /i exch def } repeat 0 setgray gene getTxEnd convertPos2pixel 10 add midy moveto gene getKgName show %URL [ /Rect [x0 0 x1 1 add featureHeight] /Border [1 0 0] /Color [1 0 0] /Action << /Subtype /URI /URI gene getChrom gene getTxStart gene getTxEnd getHyperLink >> /Subtype /Link /ANN pdfmark end } bind def
Paint all Genes
/paintGenes { 3 dict begin /genes exch def %the GENE argument (an array) /i 0 def % loop iterator /j 0 def % row iterator % draw 10 vertical lines i 0 /i exch def 0 setgray 0 1 10 { %draw a vertical line screenWidth 10 div i mul 0 moveto screenWidth 10 div i mul screenHeight lineto stroke % print the position at the top rotate by 90° screenWidth 10 div i mul 10 add screenHeight 5 sub moveto -90 rotate maxChromEnd minChromStart sub i 10 div mul minChromStart add toString show 90 rotate i 1 add /i exch def } for 0 /i exch def genes length { genes i get isVisible { gsave 0 j featureHeight 2 add mul translate genes i get paintGene j 1 add /j exch def grestore } if i 1 add /i exch def }repeat end } bind def
All in one: the postscript code
Open the PS file in ghostview, evince, ...
Zooming ? Yes we can.
Ghostview has an option -Sname=string-Sname=string -sname=string Define a name in "systemdict" with a given string as value. This is different from -d.
In my postscript file, the default values for minChromStart and maxChromEnd are overridden by the user's parameters:
systemdict /userChromStart known { userChromStart toInteger /minChromStart exch def } if systemdict /userChromEnd known { userChromEnd toInteger /maxChromEnd exch def } ifThat's it,
Pierre