writing #rstats bindings for bwa-mem, my notebook.
I wanted to learn how to bind a C library to R, so I've created the following bindings for BWA. My code is available on github at :
Most of the C code was inspired from Heng Li's code https://github.com/lh3/bwa/blob/master/example.c.A short description of the C code
In https://github.com/lindenb/rbwa/blob/master/rbwa.c:- struct RBwaHandler
- This structure holds a pointer to the bwa-index (
bwaidx_t
) and to the options for bwa (mem_opt_t
). - RBwaOpen(filename)
- This methods opens the bwa index, wrap the pointer in a 'R' tructure using
R_MakeExternalPtr
and registers a destructor '_RBwaFinalizer' to be called by the garbage manager. - _RBwaFinalizer(handler)
- This is the destructor called by the garbage manager. It calls
'RBwaClose'
- RBwaClose(handler)
- retrieves the pointer to the 'struct RBwaHandler' using
'R_ExternalPtrAddr'
, disposes the resources, free theRBwaHandler
using 'R_ClearExternalPtr' - RBwaMap(handler,DNAsequence)
- This is the workhorse of the code: it retrieves the pointer to the 'struct RBwaHandler' using
'R_ExternalPtrAddr'
, creates a "data.frame" with 6 columns (chromosome, position, strand, mapq, NM, secondary), maps the DNA sequence by calling bwa::mem_align1 and insert the hits in the data.frame.
The R code
See https://github.com/lindenb/rbwa/blob/master/rbwa.R. This R code loads the dynamic C libraries and declares the R functions calling the previous C functions using .Call:- bwa.open(filename)
- bwa.close(bwt)
- bwa.map(bwt,dnasequence)
Example
source("rbwa.R") bwt <- bwa.open("test_files/chrM.fa") for(s in c( "GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT", "GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT", "CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT", "TACTAAACCC", "GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT" )) { print(paste("QUERY:",s)); hits<-bwa.map(bwt,s) print(hits) } bwa.close(bwt);here is the R output:
> source("rbwa.R") > > bwt <- bwa.open("test_files/chrM.fa") > > for(s in c( + "GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT", + "GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT", + "CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT", + "TACTAAACCC", + "GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT" + )) + { + print(paste("QUERY:",s)); + hits<-bwa.map(bwt,s) + print(hits) + } [1] "QUERY: GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT" chrom pos strand mapq NM secondary 1 chrM 650 1 60 0 0 [1] "QUERY: GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT" chrom pos strand mapq NM secondary 1 chrM 650 1 60 2 0 [1] "QUERY: CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT" chrom pos strand mapq NM secondary 1 chrM 3100 0 60 4 0 [1] "QUERY: TACTAAACCC" [1] chrom pos strand mapq NM secondary <0 rows> (or 0-length row.names) [1] "QUERY: GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT" [1] chrom pos strand mapq NM secondary <0 rows> (or 0-length row.names) > > bwa.close(bwt); [1] TRUE
That's it,
Pierre