A streambuf contains a method named underflow called in order to make additional characters available for reading.
A first streambuf : reading the content of an URL
My first class extends std::streambuf and is a wrapper for the CURL C API. The source code is available on github at http://github.com/lindenb/cclindenb/blob/master/src/core/lindenb/net/curl_streambuf.h. Here, each time underflow is called, the CURL API is asked to download a new chunk of data from the URL and it becomes the new buffer for this streambuf.Usage
#include "lindenb/net/curl_streambuf.h"
static const char *url1="http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,100100";
static void test0001()
{
lindenb::net::curl_streambuf curl(url1);
std::istream in(&curl);
std::string line;
while(std::getline(in,line,'\n'))
{
std::cout << line << std::endl;
}
}
static const char *url1="http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,100100";
static void test0001()
{
lindenb::net::curl_streambuf curl(url1);
std::istream in(&curl);
std::string line;
while(std::getline(in,line,'\n'))
{
std::cout << line << std::endl;
}
}
Output
<?xml version="1.0" standalone="no"?>
<!DOCTYPE DASDNA SYSTEM "http://www.biodas.org/dtd/dasdna.dtd">
<DASDNA>
<SEQUENCE id="chr1" start="100000" stop="100100" version="1.00">
<DNA length="101">
cactaagcacacagagaataatgtctagaatctgagtgccatgttatcaa
attgtactgagactcttgcagtcacacaggctgacatgtaagcatcgcca
t
</DNA>
</SEQUENCE>
</DASDNA>
<!DOCTYPE DASDNA SYSTEM "http://www.biodas.org/dtd/dasdna.dtd">
<DASDNA>
<SEQUENCE id="chr1" start="100000" stop="100100" version="1.00">
<DNA length="101">
cactaagcacacagagaataatgtctagaatctgagtgccatgttatcaa
attgtactgagactcttgcagtcacacaggctgacatgtaagcatcgcca
t
</DNA>
</SEQUENCE>
</DASDNA>
A second streambuf : extracting the DNA from a BIODAS/DNA XML
For this second streambuf, I've used the Pull-Parser of libxml. This class is available at http://github.com/lindenb/cclindenb/blob/master/src/core/lindenb/bio/das/dna_streambuf.h. Here, the first time underflow is called, the instance of streambuf creates a new xmlTextReaderPtr skipping all the XML elements until it finds a new tag<DNA>
. The internal buffer for this streambuf is then filled, during the remaining calls of underflow, with the text content of the DNA until it reaches the closing tag </DNA>
Usage
#include <fstream>
#include "lindenb/bio/das/dna_streambuf.h"
#include "lindenb/net/curl_streambuf.h"
static const char *url1="http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,100100";
static void test0002()
{
lindenb::net::curl_streambuf curl(url1);
std::istream in_net(&curl);
lindenb::bio::das::dna_streambuf dasdna(in_net);
std::istream in_das(&dasdna);
for(;;)
{
int c=in_das.get();
if(c==-1) break;
std::cout << (char)c;
}
}
#include "lindenb/bio/das/dna_streambuf.h"
#include "lindenb/net/curl_streambuf.h"
static const char *url1="http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,100100";
static void test0002()
{
lindenb::net::curl_streambuf curl(url1);
std::istream in_net(&curl);
lindenb::bio::das::dna_streambuf dasdna(in_net);
std::istream in_das(&dasdna);
for(;;)
{
int c=in_das.get();
if(c==-1) break;
std::cout << (char)c;
}
}
Output
cactaagcacacagagaataatgtctagaatctgagtgccatgttatcaaattgtactgagactcttgcagtcacacaggctgacatgtaagcatcgccat
That's it
Pierre
Thank you, this is incredibly helpful!
ReplyDelete