writing #rstats bindings for bwa-mem, my notebook.
I wanted to learn how to bind a C library to R, so I've created the following bindings for BWA. My code is available on github at :
Most of the C code was inspired from Heng Li's code https://github.com/lh3/bwa/blob/master/example.c.A short description of the C code
In https://github.com/lindenb/rbwa/blob/master/rbwa.c:- struct RBwaHandler
- This structure holds a pointer to the bwa-index (
bwaidx_t) and to the options for bwa (mem_opt_t). - RBwaOpen(filename)
- This methods opens the bwa index, wrap the pointer in a 'R' tructure using
R_MakeExternalPtrand registers a destructor '_RBwaFinalizer' to be called by the garbage manager. - _RBwaFinalizer(handler)
- This is the destructor called by the garbage manager. It calls
'RBwaClose' - RBwaClose(handler)
- retrieves the pointer to the 'struct RBwaHandler' using
'R_ExternalPtrAddr', disposes the resources, free theRBwaHandlerusing 'R_ClearExternalPtr' - RBwaMap(handler,DNAsequence)
- This is the workhorse of the code: it retrieves the pointer to the 'struct RBwaHandler' using
'R_ExternalPtrAddr', creates a "data.frame" with 6 columns (chromosome, position, strand, mapq, NM, secondary), maps the DNA sequence by calling bwa::mem_align1 and insert the hits in the data.frame.
The R code
See https://github.com/lindenb/rbwa/blob/master/rbwa.R. This R code loads the dynamic C libraries and declares the R functions calling the previous C functions using .Call:- bwa.open(filename)
- bwa.close(bwt)
- bwa.map(bwt,dnasequence)
Example
source("rbwa.R")
bwt <- bwa.open("test_files/chrM.fa")
for(s in c(
"GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT",
"GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT",
"CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT",
"TACTAAACCC",
"GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT"
))
{
print(paste("QUERY:",s));
hits<-bwa.map(bwt,s)
print(hits)
}
bwa.close(bwt);
here is the R output:> source("rbwa.R")
>
> bwt <- bwa.open("test_files/chrM.fa")
>
> for(s in c(
+ "GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT",
+ "GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT",
+ "CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT",
+ "TACTAAACCC",
+ "GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT"
+ ))
+ {
+ print(paste("QUERY:",s));
+ hits<-bwa.map(bwt,s)
+ print(hits)
+ }
[1] "QUERY: GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT"
chrom pos strand mapq NM secondary
1 chrM 650 1 60 0 0
[1] "QUERY: GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT"
chrom pos strand mapq NM secondary
1 chrM 650 1 60 2 0
[1] "QUERY: CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT"
chrom pos strand mapq NM secondary
1 chrM 3100 0 60 4 0
[1] "QUERY: TACTAAACCC"
[1] chrom pos strand mapq NM secondary
<0 rows> (or 0-length row.names)
[1] "QUERY: GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT"
[1] chrom pos strand mapq NM secondary
<0 rows> (or 0-length row.names)
>
> bwa.close(bwt);
[1] TRUE
That's it,
Pierre

