writing #rstats bindings for bwa-mem, my notebook.
I wanted to learn how to bind a C library to R, so I've created the following bindings for BWA. My code is available on github at :
Most of the C code was inspired from Heng Li's code https://github.com/lh3/bwa/blob/master/example.c.A short description of the C code
In https://github.com/lindenb/rbwa/blob/master/rbwa.c:- struct RBwaHandler
- This structure holds a pointer to the bwa-index (bwaidx_t) and to the options for bwa (mem_opt_t).
- RBwaOpen(filename)
- This methods opens the bwa index, wrap the pointer in a 'R' tructure using R_MakeExternalPtrand registers a destructor '_RBwaFinalizer' to be called by the garbage manager.
- _RBwaFinalizer(handler)
- This is the destructor called by the garbage manager. It calls 'RBwaClose'
- RBwaClose(handler)
- retrieves the pointer  to the 'struct RBwaHandler' using 'R_ExternalPtrAddr', disposes the resources, free theRBwaHandlerusing 'R_ClearExternalPtr'
- RBwaMap(handler,DNAsequence)
- This is the workhorse of the code: it retrieves the pointer to the 'struct RBwaHandler' using 'R_ExternalPtrAddr', creates a "data.frame" with 6 columns (chromosome, position, strand, mapq, NM, secondary), maps the DNA sequence by calling bwa::mem_align1 and insert the hits in the data.frame.
The R code
See https://github.com/lindenb/rbwa/blob/master/rbwa.R. This R code loads the dynamic C libraries and declares the R functions calling the previous C functions using .Call:- bwa.open(filename)
- bwa.close(bwt)
- bwa.map(bwt,dnasequence)
Example
source("rbwa.R")
bwt <- bwa.open("test_files/chrM.fa")
for(s in c(
 "GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT",
 "GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT",
 "CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT",
 "TACTAAACCC",
 "GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT" 
 ))
 {
 print(paste("QUERY:",s));
 hits<-bwa.map(bwt,s)
 print(hits)
 }
bwa.close(bwt);
here is the R output:> source("rbwa.R")
> 
> bwt <- bwa.open("test_files/chrM.fa")
> 
> for(s in c(
+   "GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT",
+   "GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT",
+   "CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT",
+   "TACTAAACCC",
+   "GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT" 
+   ))
+  {
+  print(paste("QUERY:",s));
+  hits<-bwa.map(bwt,s)
+  print(hits)
+  }
[1] "QUERY: GCATGTGTAATCTTACTAAGAGCTAATAGAAAGGCTAGGACCAAACCTAT"
  chrom pos strand mapq NM secondary
1  chrM 650      1   60  0         0
[1] "QUERY: GCATGTGTAATCTTACTAAGCTAATAGAAAGGCTAGGACCAAACCTAT"
  chrom pos strand mapq NM secondary
1  chrM 650      1   60  2         0
[1] "QUERY: CTATCTACTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTAT"
  chrom  pos strand mapq NM secondary
1  chrM 3100      0   60  4         0
[1] "QUERY: TACTAAACCC"
[1] chrom     pos       strand    mapq      NM        secondary
<0 rows> (or 0-length row.names)
[1] "QUERY: GCGAACCCAACTTCGATTCCCTCGCCGATCTCCGACGGAGCCGTGTGCAT"
[1] chrom     pos       strand    mapq      NM        secondary
<0 rows> (or 0-length row.names)
> 
> bwa.close(bwt);
[1] TRUE
That's it,
Pierre
 

1 comment:
Nice mini-tut. Thanks!
Post a Comment