30 January 2014

Parallelizing #RStats using #make

In the current post, I'll show how to use R as the main SHELL of GNU-Make instead of using a classical linux shell like 'bash'. Why would you do this ?

  • awesomeness
  • Make-based workflow management
  • Make-based execution with --jobs. GNU make knows how to execute several recipes at once. Normally, make will execute only one recipe at a time, waiting for it to finish before executing the next. However, the '-j' or '--jobs' option tells make to execute many recipes simultaneously.
The following recipe has been tested with GNU-Make 4.0 and I'm not sure it would world with '<=3.81'.

The only problem is that R doesn't accept a multiline-argument on the command line (see http://stackoverflow.com/questions/21442674) so I created a wrapper 'mockR' that save the argument '-e "code"' into a file and pipe it into R:
#!/bin/bash
while getopts "e:" opt; do
case $opt in
e)
R=$OPTARG
;;
\:)
echo "Argument missing: -$OPTARG" >&2
exit -1
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
if [[ -n "$R" ]]
then
TMP=$(mktemp)
trap "{ rm -f ${TMP}; exit 255; }" SIGINT
echo "${R}" > ${TMP}
R --vanilla --no-readline --quiet < ${TMP}
rm -f ${TMP}
else
echo "Undefined script" >&2
exit -1
fi
view raw mockR hosted with ❤ by GitHub

(Edit1: A comment from madscientist : Re your script; you can save yourself some wear-and-tear on your disk and avoid the need for temp files and cleanup by just piping the input directly: echo "$R" | R --vanilla --no-readline --quiet . Just a thought. ")

(Edit2: the exit value of 'R' should also be returned by 'mockR'.)

This file is set as executable:
$ chmod u+x ./mockR
In the makefile, we tell 'make' to use 'mockR' instead of '/usr/bin/sh':
SHELL = ./mockR
The R code will be passed to 'mockR' using the argument '-e "code"'
.SHELLFLAGS= -e
We also set 'ONESHELL': "If .ONESHELL is mentioned as a target, then when a target is built all lines of the recipe will be given to a single invocation of the shell rather than each line being invoked separately"
.ONESHELL:

Example 1

We download the table 'knownGene' from the UCSC and we plot a pdf file 'countExons=f(txStart)'. Please, note that the targets are created using some R statements, NOT bash statements:
.PHONY: all clean
SHELL = ./mockR
.SHELLFLAGS= -e
.ONESHELL:
CHROM=1 2
UCSC=http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/
all: $(foreach C,${CHROM}, chr${C}.pdf)
chr1.pdf: knownGene.txt.gz
gold <- read.delim(gzfile("$<"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr1')
pdf(file="$@")
plot(gold$$txStart,gold$$exonCount)
dev.off()
chr2.pdf: knownGene.txt.gz
gold <- read.delim(gzfile("$<"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr2')
pdf(file="$@")
plot(gold$$txStart,gold$$exonCount)
dev.off()
knownGene.txt.gz:
download.file("${UCSC}/$@","$@")
clean:
$(foreach F,knownGene.txt.gz $(foreach C,${CHROM}, chr${C}.pdf) ,file.remove("$F");)
view raw R01.mk hosted with ❤ by GitHub

Now Invoke make

$ make-4.0/make -f R01.mk
download.file("http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database//knownGene.txt.gz","knownGene.txt.gz")
> download.file("http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database//knownGene.txt.gz","knownGene.txt.gz")
trying URL 'http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database//knownGene.txt.gz'
Content type 'application/x-gzip' length 4270683 bytes (4.1 Mb)
opened URL
==================================================
downloaded 4.1 Mb
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr1')
pdf(file="chr1.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr1')
> pdf(file="chr1.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr2')
pdf(file="chr2.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr2')
> pdf(file="chr2.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
view raw execute01.txt hosted with ❤ by GitHub

Example 2

Using a the eval and the call function we can make the previous 'Makefile' applicable for all the chromosomes:
.PHONY: all clean
SHELL = ./mockR
.SHELLFLAGS= -e
.ONESHELL:
CHROM=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
UCSC=http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/
define plot_knownGene
chr$(1).pdf: knownGene.txt.gz
gold <- read.delim(gzfile("$$<"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr$(1)')
pdf(file="$$@")
plot(gold$$$$txStart,gold$$$$exonCount)
dev.off()
endef
all: $(foreach C,${CHROM}, chr${C}.pdf)
$(foreach C,${CHROM}, $(eval $(call plot_knownGene,$C)))
knownGene.txt.gz:
download.file("${UCSC}/$@","$@")
clean:
$(foreach F,knownGene.txt.gz $(foreach C,${CHROM}, chr${C}.pdf) ,file.remove("$F");)
view raw R02.mk hosted with ❤ by GitHub

Now Invoke make USING TRHEE PARALLEL JOBS

$ ./make -j 3 -f R02.mk all
download.file("http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database//knownGene.txt.gz","knownGene.txt.gz")
> download.file("http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database//knownGene.txt.gz","knownGene.txt.gz")
trying URL 'http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database//knownGene.txt.gz'
Content type 'application/x-gzip' length 4270683 bytes (4.1 Mb)
opened URL
==================================================
downloaded 4.1 Mb
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr1')
pdf(file="chr1.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr2')
pdf(file="chr2.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr3')
pdf(file="chr3.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr3')
> gold=subset(gold,chrom=='chr1')
> pdf(file="chr3.pdf")
> gold=subset(gold,chrom=='chr2')
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr4')
pdf(file="chr4.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> pdf(file="chr1.pdf")
> pdf(file="chr2.pdf")
> plot(gold$txStart,gold$exonCount)
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr5')
pdf(file="chr5.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr6')
pdf(file="chr6.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr5')
> gold=subset(gold,chrom=='chr6')
> pdf(file="chr5.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr7')
pdf(file="chr7.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> pdf(file="chr6.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr8')
pdf(file="chr8.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr4')
> pdf(file="chr4.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr9')
pdf(file="chr9.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr7')
> pdf(file="chr7.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr10')
pdf(file="chr10.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr9')
> pdf(file="chr9.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr11')
pdf(file="chr11.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold=subset(gold,chrom=='chr8')
> pdf(file="chr8.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr12')
pdf(file="chr12.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr10')
> pdf(file="chr10.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr13')
pdf(file="chr13.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr11')
> pdf(file="chr11.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr14')
pdf(file="chr14.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr12')
> pdf(file="chr12.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr15')
pdf(file="chr15.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr13')
> pdf(file="chr13.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr16')
pdf(file="chr16.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr14')
> pdf(file="chr14.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr17')
pdf(file="chr17.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr15')
> pdf(file="chr15.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr18')
pdf(file="chr18.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr16')
> pdf(file="chr16.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr19')
pdf(file="chr19.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr17')
> pdf(file="chr17.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr20')
pdf(file="chr20.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr18')
> pdf(file="chr18.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr21')
pdf(file="chr21.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr19')
> pdf(file="chr19.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chr22')
pdf(file="chr22.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr20')
> pdf(file="chr20.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chrX')
pdf(file="chrX.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> gold=subset(gold,chrom=='chr21')
> pdf(file="chr21.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
gold <- read.delim(gzfile("knownGene.txt.gz"))
colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
gold=subset(gold,chrom=='chrY')
pdf(file="chrY.pdf")
plot(gold$txStart,gold$exonCount)
dev.off()
> gold <- read.delim(gzfile("knownGene.txt.gz"))
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chr22')
> pdf(file="chr22.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> colnames(gold)= c("name", "chrom", "strand", "txStart", "txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID")
> gold=subset(gold,chrom=='chrX')
> gold=subset(gold,chrom=='chrY')
> pdf(file="chrX.pdf")
> pdf(file="chrY.pdf")
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
> plot(gold$txStart,gold$exonCount)
> dev.off()
null device
1
>
view raw execute02.txt hosted with ❤ by GitHub




You can now watch the final pdf files:




That's it,
Pierre