PostScript as a Programming Language for Bioinformatics: mynotebook
"PostScript (PS) is an interpreted, stack-based programming language. It is best known for its use as a page description language in the electronic and desktop publishing areas."[wikipedia]. In this post, I'll show how I've used to create a simple and lightweight view
of the genome.
Introduction: just a simple postscript program
The following PS program fills a rectangular gray shape; You can display the result using ghostview, a2ps, etc...%!PS newpath 50 50 moveto 0 100 rlineto 100 0 rlineto 0 -100 rlineto closepath 0.5 setgray fill showpage
Some global variables
The page width
/screenWidth 1000 def
The page width
/screenHeight 1000 def
The minimum 5' position
/minChromStart 1E9 def
The maximum 3' position
/maxChromEnd -1 def
The size of a genomic feature
/featureHeight 20 def
The distance between two 'ticks' for drawing the orientation
/ticksx 20 def
The font size
/theFontSize 9 defThe variable knownGene is a PS array of genes.
/knownGene [ [(uc002zkr.3) (chr22) (-) 161242... ...] ] def
Each Gene is a PS array holding the structure of the UCSC knownGene table, that is to say: name , chromosome, txStart, txEnd, cdsStart, cdsEnd, exonStarts, exonEnds:
[(uc002zmh.2) (chr22) (-) 17618410 17646177 17618910 17646134 [17618410 17619439 17621948 17623987 17625913 17629337 17630431 17646098 ] [17619247 17619628 17622123 17624021 17626007 17629450 17630635 17646177 ] ]. a simple command line can be used to fetch those data:
% curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" |\
gunzip -c | grep chr22 | head -n 20 |\
awk '{printf("[(%s) (%s) (%s) %s %s %s %s [%s] [%s] ]\n",$1,$2,$3,$4,$5,$6,$7,$9,$10);}' |\
tr "," " " > result.txt
Some utilities
converting a PS object to string
/toString
{
20 string cvs
} bind def
Converting a string to interger (loop over each character an increase the current value)
/toInteger
{
3 dict begin
/s exch def
/i 0 def
/n 0 def
s {
n 10 mul /n exch def
s i get 48 sub n add /n exch def %48=ascii('0')
i 1 add /i exch def
} forall
n % leave n on the stack
end
} bind def
Convert a genomic position to a index on the page 'x' axis
/convertPos2pixel
{
minChromStart sub maxChromEnd minChromStart sub div screenWidth mul
} bind def
Extract the chromosome (that is to say, extract the 1st element of the current array on the stack)
/getChrom
{
1 get
} bind def
Create a hyperlink to the UCSC genome browser
/getHyperLink
{
3 dict begin
/E exch def %% END
/S exch def %% START
/C exch def %% CHROMOSOME
[ (http://genome.ucsc.edu/cgi-bin/hgTracks?position=) C (:) S toString (-) E toString (&) (&db=hg19) ] concatstringarray
end
} bind def
Paint a rectangle
/box
{
4 dict begin
/height exch def
/width exch def
/y exch def
/x exch def
x y moveto
width 0 rlineto
0 height rlineto
width -1 mul 0 rlineto
0 height -1 mul rlineto
end
} bind def
Paint a gray gradient
/gradient
{
4 dict begin
/height exch def
/width exch def
/y exch def
/x exch def
/i 0 def
height 2 div /i exch def
0 1 height 2 div {
1 i height 2.0 div div sub setgray
newpath
x
y height 2 div i sub add
width
i 2 mul
box
closepath
fill
i 1 sub /i exch def
}for
newpath
0 setgray
0.4 setlinewidth
x y width height box
closepath
stroke
end
} bind def
Methods extracting a data about the current gene on the PS stack.
Extract the transcription start:/getTxStart
{
3 get
} bind def
Extract the transcription end:
/getTxEnd
{
4 get
} bind def
Extract the CDS start:
/getCdsStart
{
5 get
} bind def
Extract the transcription end:
/getCdsEnd
{
6 get
} bind def
Extract the strand:
/getStrand
{
2 get (+) eq {1} {-1} ifelse
} bind def
Get the gene name/getKgName
{
0 get
} bind def
Get the number of exons:
/getExonCount
{
7 get length
} bind def
Get the start position of the i-th exon:
/getExonStart
{
2 dict begin
/i exch def
/gene exch def
gene 7 get i get
end
} bind def
Get the end position of the i-th exon:
/getExonEnd
{
2 dict begin
/i exch def
/gene exch def
gene 8 get i get
end
} bind def
Should we draw this gene on the page ?
/isVisible
{
1 dict begin
/gene exch def
minChromStart gene getTxEnd gt
{
false
}
{
gene getTxStart maxChromEnd gt
{
false
}
{
true
}ifelse
}ifelse
end
}bind def
Methods for an array of genes
Loop over the genes and extract the lowest 5' index:/getMinChromStart
{
3 dict begin
/genes exch def
/pos 10E9 def
/i 0 def
genes length {
genes i get getTxStart pos min /pos exch def
i 1 add /i exch def
}repeat
pos
end
} bind def
Loop over the genes and extract the highest 3' index:
/getMaxChromEnd
{
3 dict begin
/genes exch def
/pos -1E9 def
/i 0 def
genes length {
genes i get getTxEnd pos max /pos exch def
i 1 add /i exch def
}repeat
pos
end
} bind def
Painting ONE Gene
/paintGene
{
5 dict begin
/gene exch def %% the GENE argument
/midy featureHeight 2.0 div def %the middle of the row
/x0 gene getTxStart convertPos2pixel def % 5' side of the gene in pixel
/x1 gene getTxEnd convertPos2pixel def % 3' side of the gene in pixel
/i 0 def
0.1 setlinewidth
1 0 0 setrgbcolor
newpath
x0 midy moveto
x1 midy lineto
closepath
stroke
% paint ticks
0 1 x1 x0 sub ticksx div{
newpath
gene getStrand 1 eq
{
x0 ticksHeight sub i add midy ticksHeight add moveto
x0 i add midy lineto
x0 ticksHeight sub i add midy ticksHeight sub lineto
}
%else
{
x0 ticksHeight add i add midy ticksHeight add moveto
x0 i add midy lineto
x0 ticksHeight add i add midy ticksHeight sub lineto
} ifelse
stroke
i ticksx add /i exch def
} for
%paint Transcript start-end
0 0 1 setrgbcolor
newpath
gene getCdsStart convertPos2pixel
midy cdsHeight 2 div sub
gene getCdsEnd convertPos2pixel gene getCdsStart convertPos2pixel sub
cdsHeight box
closepath
fill
% loop over exons
0 /i exch def
gene getExonCount
{
gene i getExonStart convertPos2pixel
midy exonHeight 2 div sub
gene i getExonEnd convertPos2pixel gene i getExonStart convertPos2pixel sub
exonHeight gradient
i 1 add /i exch def
} repeat
0 setgray
gene getTxEnd convertPos2pixel 10 add midy moveto
gene getKgName show
%URL
[ /Rect [x0 0 x1 1 add featureHeight]
/Border [1 0 0]
/Color [1 0 0]
/Action << /Subtype /URI /URI gene getChrom gene getTxStart gene getTxEnd getHyperLink >>
/Subtype /Link
/ANN pdfmark
end
} bind def
Paint all Genes
/paintGenes
{
3 dict begin
/genes exch def %the GENE argument (an array)
/i 0 def % loop iterator
/j 0 def % row iterator
% draw 10 vertical lines
i 0 /i exch def
0 setgray
0 1 10 {
%draw a vertical line
screenWidth 10 div i mul 0 moveto
screenWidth 10 div i mul screenHeight lineto
stroke
% print the position at the top rotate by 90°
screenWidth 10 div i mul 10 add screenHeight 5 sub moveto
-90 rotate
maxChromEnd minChromStart sub i 10 div mul minChromStart add toString show
90 rotate
i 1 add /i exch def
} for
0 /i exch def
genes length {
genes i get isVisible
{
gsave
0 j featureHeight 2 add mul translate
genes i get paintGene
j 1 add /j exch def
grestore
} if
i 1 add /i exch def
}repeat
end
} bind def
All in one: the postscript code
Open the PS file in ghostview, evince, ...
Zooming ? Yes we can.
Ghostview has an option -Sname=string -Sname=string
-sname=string
Define a name in "systemdict" with a given string as value.
This is different from -d.
In my postscript file, the default values for minChromStart and maxChromEnd are overridden by the user's parameters:
systemdict /userChromStart known {
userChromStart toInteger /minChromStart exch def
} if
systemdict /userChromEnd known {
userChromEnd toInteger /maxChromEnd exch def
} if
That's it,Pierre
3 comments:
That's really lightweight, great! Is there a way to pass the gene list in the command-line instead of copy-paste it in the ".ps" file?
@Tim I don't know, I'm not a PS guru. I briefly searched on the web but found no answer.
Yes PS has file operators, and you can use them in GhistScript. See http://www.ghostscript.com/doc/current/Use.htm#Finding_files
Post a Comment