Parallelizing #RStats using #make
In the current post, I'll show how to use R as the main SHELL of GNU-Make instead of using a classical linux shell like 'bash'. Why would you do this ?
- awesomeness
- Make-based workflow management
- Make-based execution with --jobs. GNU make knows how to execute several recipes at once. Normally, make will execute only one recipe at a time, waiting for it to finish before executing the next. However, the '-j' or '--jobs' option tells make to execute many recipes simultaneously.
The only problem is that R doesn't accept a multiline-argument on the command line (see http://stackoverflow.com/questions/21442674) so I created a wrapper 'mockR' that save the argument '-e "code"' into a file and pipe it into R:
(Edit1: A comment from madscientist : Re your script; you can save yourself some wear-and-tear on your disk and avoid the need for temp files and cleanup by just piping the input directly:
echo "$R" | R --vanilla --no-readline --quiet
. Just a thought. ")(Edit2: the exit value of 'R' should also be returned by 'mockR'.)
This file is set as executable:
$ chmod u+x ./mockRIn the makefile, we tell 'make' to use 'mockR' instead of '/usr/bin/sh':
SHELL = ./mockRThe R code will be passed to 'mockR' using the argument '-e "code"'
.SHELLFLAGS= -eWe also set 'ONESHELL': "If .ONESHELL is mentioned as a target, then when a target is built all lines of the recipe will be given to a single invocation of the shell rather than each line being invoked separately"
.ONESHELL:
Example 1
We download the table 'knownGene' from the UCSC and we plot a pdf file 'countExons=f(txStart)'. Please, note that the targets are created using some R statements, NOT bash statements:Now Invoke make
Example 2
Using a theeval
and the call
function we can make the previous 'Makefile' applicable for all the chromosomes:Now Invoke make USING TRHEE PARALLEL JOBS
You can now watch the final pdf files:
That's it,
Pierre
1 comment:
Very cool.
Post a Comment