28 February 2015

Integrating a java program in #usegalaxy.

This is my notebook for the integration of java programs in https://usegalaxy.org/ .

create a directory for your tools under ${galaxy-root}/tools

mkdir ${galaxy-root}/tools/jvarkit

put all the required jar files and the XML files describing your tools (see below) in this new directory:

$ ls ${galaxy-root}/tools/jvarkit/
commons-jexl-2.1.1.jar
groupbygene.jar
htsjdk-1.128.jar
vcffilterjs.jar
vcffilterso.jar
vcfhead.jar
vcftail.jar
vcftrio.jar
commons-logging-1.1.1.jar
groupbygene.xml
snappy-java-1.0.3-rc3.jar
vcffilterjs.xml
vcffilterso.xml
vcfhead.xml
vcftail.xml
vcftrio.xml

Each tool is described with a XML file whose schema is available at: https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax . For example here is a simple file describing the tool VcfHead which prints the very first variants of a VCF file:

<?xml version="1.0"?>
<tool id="com.github.lindenb.jvarkit.tools.misc.VcfHead" version="1.0.0" name="vcfhead">
  <description>Print first variants of a VCF</description>
  <requirements>
    <requirement type="binary">java</requirement>
  </requirements>
  <command>(gunzip -c ${input} || cat ${input}) | java -cp $__tool_directory__/commons-jexl-2.1.1.jar:$__tool_directory__/commons-logging-1.1.1.jar:$__tool_directory__/htsjdk-1.128.jar:$__tool_directory__/snappy-java-1.0.3-rc3.jar:$__tool_directory__/vcfhead.jar com.github.lindenb.jvarkit.tools.misc.VcfHead -n '${num}' -o ${output}.vcf.gz &amp;&amp; mv ${output}.vcf.gz ${output}</command>
  <inputs>
    <param format="vcf" name="input" type="data" label="VCF input"/>
    <param name="num" type="integer" label="Number of variants" min="0" value="10"/>
  </inputs>
  <outputs>
    <data format="vcf" name="output"/>
  </outputs>
  <stdio>
    <exit_code range="1:"/>
    <exit_code range=":-1"/>
  </stdio>
  <help/>
</tool>

The input file is described by

<param format="vcf" name="input" type="data" label="VCF input"/>

The number of lines is declared in:

<param name="num" type="integer" label="Number of variants" min="0" value="10"/>

Those two variables will be replaced in the command line at runtime by galaxy.
The command line is

(gunzip -c ${input} || cat ${input}) | \
java -cp $__tool_directory__/commons-jexl-2.1.1.jar:$__tool_directory__/commons-logging-1.1.1.jar:$__tool_directory__/htsjdk-1.128.jar:$__tool_directory__/snappy-java-1.0.3-rc3.jar:$__tool_directory__/vcfhead.jar \
 com.github.lindenb.jvarkit.tools.misc.VcfHead \
 -n '${num}' -o ${output}.vcf.gz  &amp;&amp; \
mv ${output}.vcf.gz ${output}

it starts with (gunzip -c ${input} || cat ${input}) because we don't know if the input will be gzipped.
The main problem, here, is to set the CLASSPATH and tell java where to find the jar libraries. With the help of @pjacock and @jmchilton I learned that the recent release of galaxy defines a variable $__tool_directory__ defining the location of your repository, so you'll just have to append the jar file to this variable:

$__tool_directory__/commons-jexl-2.1.1.jar:$__tool_directory__/commons-logging-1.1.1.jar:....

You'll need to declare the new tools in ${galaxy-root}/config/tool_conf.xml

(...)
</section>
<section id="jvk" name="JVARKIT">
<tool file="jvarkit/vcffilterjs.xml"/>
<tool file="jvarkit/vcfhead.xml"/>
<tool file="jvarkit/vcftail.xml"/>
<tool file="jvarkit/vcffilterso.xml"/>
<tool file="jvarkit/vcftrio.xml"/>
<tool file="jvarkit/vcfgroupbygene.xml"/>
</section>
</toolbox>

Your tools are now available in the 'tools' menu

leftmenu

Clicking on a link will make galaxy displaying a form for your tool:

screenshot

As far as I can see for now, making a tar archive of your tool directory and uploading it in the galaxy toolshed ( https://toolshed.g2.bx.psu.edu/repository ), will make your tools available to the scientific community.

toolshed

That's it,
Pierre

22 February 2015

Drawing a Manhattan plot in SVG using a GWAS+XML model.

On friday, I saw my colleague @b_l_k starting writing SVG+XML code to draw a Manhattan plot. I told him that a better idea would be to describe the data using XML and to transform the XML to SVG using XSLT.

So, let's do this. I put the XSLT stylesheet on github at https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/manhattan.xsl . And the model of data would look like this (I took the data from @genetics_blog's http://www.gettinggeneticsdone.com/2011/04/annotated-manhattan-plots-and-qq-plots.html ) .

<?xml version="1.0"?>
<manhattan>
  <chromosome name="1" length="1500">
    <data rs="1" position="1" p="0.914806043496355"/>
    <data rs="2" position="2" p="0.937075413297862"/>
    <data rs="3" position="3" p="0.286139534786344"/>
    <data rs="4" position="4" p="0.830447626067325"/>
    <data rs="5" position="5" p="0.641745518893003"/>
 (...)
  </chromosome>
  <chromosome name="22" length="535">
    <data rs="15936" position="1" p="0.367785102687776"/>
    <data rs="15937" position="2" p="0.628192085539922"/>
(...)
    <data rs="1" position="1" p="0.914806043496355"/>
  </chromosome>
</manhattan>

The stylesheet

At the beginning , we declare the size of the drawing

<xsl:variable name="width" select="number(1000)"/>
<xsl:variable name="height" select="number(400)"/>

we need to get the size of the genome.

<xsl:variable name="genomeSize">
    <xsl:call-template name="sumLengthChrom">
      <xsl:with-param name="length" select="number(0)"/>
      <xsl:with-param name="node" select="manhattan/chromosome[1]"/>
    </xsl:call-template>
</xsl:variable>

We could use the xpath function 'sum()' but here I the user is free to omit the size of the chromosome. It this attribute '@length' is not declared, we use the maximum position of the SNP in this chromosome:

<xsl:template name="sumLengthChrom">
    <xsl:param name="length"/>
    <xsl:param name="node"/>
    <xsl:variable name="chromlen">
      <xsl:apply-templates select="$node" mode="length"/>
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="count($node/following-sibling::chromosome)&gt;0">
        <xsl:call-template name="sumLengthChrom">
          <xsl:with-param name="length" select="$length + number($chromlen)"/>
          <xsl:with-param name="node" select="$node/following-sibling::chromosome[1]"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$length + number($chromlen)"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

we get the smallest p-value:

<xsl:variable name="minpvalue">
    <xsl:for-each select="manhattan/chromosome/data">
      <xsl:sort select="@p" data-type="number" order="ascending"/>
      <xsl:if test="position() = 1">
        <xsl:value-of select="number(@p)"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>

then we plot each chromosome, the xsl parameter "previous" contains the number of bases already printed.
We use the SVG attribute transform to translate the current chromosome in the drawing

<xsl:template name="plotChromosomes">
    <xsl:param name="previous"/>
    <xsl:param name="node"/>
(...)
      <xsl:attribute name="transform">
        <xsl:text>translate(</xsl:text>
        <xsl:value-of select="(number($previous) div number($genomeSize)) * $width"/>
        <xsl:text>,0)</xsl:text>
      </xsl:attribute>

we plot each SNP:

<svg:g style="fill-opacity:0.5;">
        <xsl:apply-templates select="data" mode="plot"/>
      </svg:g>

and we plot the remaining chromosomes, if any :

<xsl:if test="count($node/following-sibling::chromosome)&gt;0">
      <xsl:call-template name="plotChromosomes">
        <xsl:with-param name="previous" select="$previous + number($chromlen)"/>
        <xsl:with-param name="node" select="$node/following-sibling::chromosome[1]"/>
      </xsl:call-template>
    </xsl:if>

to plot each SNP, we get the X coordinate in the current chromosome:

<xsl:template match="data" mode="x">
    <xsl:variable name="chromWidth">
      <xsl:apply-templates select=".." mode="width"/>
    </xsl:variable>
    <xsl:variable name="chromLength">
      <xsl:apply-templates select=".." mode="length"/>
    </xsl:variable>
    <xsl:value-of select="(number(@position) div number($chromLength)) * $chromWidth"/>
  </xsl:template>

and the Y coordinate:

<xsl:template match="data" mode="y">
    <xsl:value-of select="$height - (( (math:log(number(@p)) * -1 ) div $maxlog2value ) * $height )"/>
  </xsl:template>

we can also wrap the data in a hyperlink if a @rs attribute exists:

<xsl:choose>
      <xsl:when test="@rs">
        <svg:a target="_blank">
          <xsl:attribute name="xlink:href">
            <xsl:value-of select="concat('http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=',@rs)"/>
          </xsl:attribute>
          <xsl:apply-templates select="." mode="plotshape"/>
        </svg:a>
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates select="." mode="plotshape"/>
      </xsl:otherwise>
    </xsl:choose>

we plot the SNP itself, as a circle:

<xsl:template match="data" mode="plotshape">
    <svg:circle r="5">
      <xsl:attribute name="cx">
        <xsl:apply-templates select="." mode="x"/>
      </xsl:attribute>
      <xsl:attribute name="cy">
        <xsl:apply-templates select="." mode="y"/>
      </xsl:attribute>
    </svg:circle>
  </xsl:template>

Result:

$ xsltproc manhattan.xsl model.xml > plot.svg

Plot

That's it
Pierre

18 February 2015

Automatic code generation for @knime with XSLT: An example with two nodes: fasta reader and writer.

KNIME is a java+eclipse-based graphical workflow-manager.

Biologists in my lab often use this tool to filter VCFs or other tabular data. A software Development kit (SDK) is provided to build new nodes. My main problem with this SDK is, that you need to write a large number of similar files and you also have to interact with a graphical interface. I wanted to automatize the generation of java code for new node. In the following post I will describe how I wrote two new node for reading and writing fasta files.

The nodes are described in a XML file and the java code is generated with a XSLT stylesheet and is available on github at:

https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/knime/knime2java.xsl

Example

We're going to create two nodes for FASTA:

a fasta reader
a fasta writer

We define a plugin.xml file, it uses xinclude to include the definition of the two nodes. The base package of will be com.github.lindenb.xsltsandbox . The nodes will be displayed in the knime-workbench under /community/bio/fasta

<?xml version="1.0" encoding="UTF-8"?>
<plugin xmlns:xi="http://www.w3.org/2001/XInclude"
        package="com.github.lindenb.xsltsandbox"
        >
  <category name="bio">
    <category name="fasta" label="Fasta">
      <xi:include href="node.read-fasta.xml"/>
      <xi:include href="node.write-fasta.xml"/>
    </category>
  </category>
</plugin>

node.read-fasta.xml : it takes a FileReader (for the input fasta file ) and an integer to limit the number of fasta sequences to be read. The outpout will be a table with two columns (name/sequence). We only write the code for reading the fasta file.

<?xml version="1.0" encoding="UTF-8"?>
<node name="readfasta" label="Read Fasta" description="Reads a Fasta file">
  <property type="file-read" name="fastaIn">
    <extension>.fa</extension>
    <extension>.fasta</extension>
    <extension>.fasta.gz</extension>
    <extension>.fa.gz</extension>
  </property>
  <property type="int" name="limit" label="max sequences" description="number of sequences to be fetch. 0 = ALL" default="0">
  </property>
  <property type="bool" name="upper" label="Uppercase" description="Convert to Uppercase" default="false">
  </property>
  <outPort name="output">
    <column name="title" label="Title" type="string"/>
    <column name="sequence" label="Sequence" type="string"/>
  </outPort>
  <code>
    <import>
      import java.io.*;
      </import>
    <body>
    @Override
    protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
        {
        int limit = this.getPropertyLimitValue();
        String url = this.getPropertyFastaInValue();
        boolean to_upper = this.getPropertyUpperValue();
        getLogger().info("reading "+url);
        java.io.BufferedReader r= null;
        int n_sequences = 0;
        try
            {
            r = this.openUriForBufferedReader(url);

            DataTableSpec dataspec0 = this.createOutTableSpec0();
            BufferedDataContainer container0 = exec.createDataContainer(dataspec0);

            String seqname="";
            StringBuilder sequence=new StringBuilder();
            for(;;)
                {
                exec.checkCanceled();
                exec.setMessage("Sequences "+n_sequences);
                String line= r.readLine();
                if(line==null || line.startsWith("&gt;"))
                    {
                    if(!(sequence.length()==0 &amp;&amp; seqname.trim().isEmpty()))
                        {
                          container0.addRowToTable(new  org.knime.core.data.def.DefaultRow(
                          org.knime.core.data.RowKey.createRowKey(n_sequences),
                        this.createDataCellsForOutTableSpec0(seqname,sequence)
                                                                                                                                    ));
                        ++n_sequences;
                        }
                    if(line==null) break;
                    if( limit!=0 &amp;&amp; limit==n_sequences) break;
                    seqname=line.substring(1);
                    sequence=new StringBuilder();
                    }
                else
                    {
                    line= line.trim();
                    if( to_upper ) line= line.toUpperCase();
                    sequence.append(line);
                    }
                }
            container0.close();
            BufferedDataTable out0 = container0.getTable();
            return new BufferedDataTable[]{out0};
            }
        finally
            {
            r.close();
            }
        }
      </body>
  </code>
</node>

node.write-fasta.xml : it needs an input dataTable with two column (name/sequence), an integer to set the lentgh of the lines and requires a file-writer to write the fasta file.

<?xml version="1.0" encoding="UTF-8"?>
<node name="writefasta" label="Write Fasta" description="Write a Fasta file">
  <inPort name="input">
  </inPort>
  <property type="file-save" name="fastaOut">
  </property>
  <property type="column" name="title" label="Title" description="Fasta title" data-type="string">
  </property>
  <property type="column" name="sequence" label="Sequence" description="Fasta Sequence" data-type="string">
  </property>
  <property type="int" name="fold" label="Fold size" description="Fold sequences greater than..." default="60">
  </property>
  <code>
    <import>
      import org.knime.core.data.container.CloseableRowIterator;
      import java.io.*;
      </import>
    <body>
    @Override
    protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
        {
        CloseableRowIterator iter=null;
        BufferedDataTable inTable=inData[0];
        int fold = this.getPropertyFoldValue();
        int tIndex = this.findTitleRequiredColumnIndex(inTable.getDataTableSpec());
        int sIndex = this.findSequenceRequiredColumnIndex(inTable.getDataTableSpec());
        PrintWriter w =null;
        try
            {
            w= openFastaOutForPrinting();

            int nRows=0;
            double total=inTable.getRowCount();
            iter=inTable.iterator();
            while(iter.hasNext())
                {
                DataRow row=iter.next();
                DataCell tCell =row.getCell(tIndex);

                DataCell sCell =row.getCell(sIndex);

                w.print("&gt;");
                if(!tCell.isMissing())
                    { 
                    w.print(StringCell.class.cast(tCell).getStringValue());
                    }
                if(!sCell.isMissing())
                    {
                    String sequence = StringCell.class.cast(sCell).getStringValue();
                    for(int i=0;i&lt;sequence.length();++i)
                        {
                        if(i%fold == 0) w.println();
                        w.print(sequence.charAt(i));
                        exec.checkCanceled();
                        }
                    }
                w.println();

                exec.checkCanceled();
                exec.setProgress(nRows/total,"Saving Fasta");
                ++nRows;
                }
            w.flush();
            return new BufferedDataTable[0];
            }
        finally
            {
            if(w!=null) w.close();
            }
        }
      </body>
  </code>
</node>

The following Makefile generates the code, compiles and installs the new plugin in the ${knime.root}/plugins directory :

.PHONY:all clean install run

knime.root=${HOME}/package/knime_2.11.2

all: install

run: install
        ${knime.root}/knime -clean

install:
        rm -rf generated
        xsltproc --xinclude \
                --stringparam base.dir generated \
                knime2java.xsl plugin.xml
        $(MAKE) -C generated install knime.root=${knime.root}


clean:
        rm -rf generated

The code generated by this Makefile:

$ find generated/ -type f
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeFactory.xml
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodePlugin.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeFactory.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeDialog.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/AbstractReadfastaNodeModel.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/readfasta/ReadfastaNodeModel.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeFactory.xml
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodePlugin.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeFactory.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeDialog.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/AbstractWritefastaNodeModel.java
generated/src/com/github/lindenb/xsltsandbox/bio/fasta/writefasta/WritefastaNodeModel.java
generated/src/com/github/lindenb/xsltsandbox/CompileAll__.java
generated/src/com/github/lindenb/xsltsandbox/AbstractNodeModel.java
generated/MANIFEST.MF
generated/Makefile
generated/plugin.xml
generated/dist/com_github_lindenb_xsltsandbox.jar
generated/dist/com.github.lindenb.xsltsandbox_2015.02.18.jar

The file generated/dist/com.github.lindenb.xsltsandbox_2015.02.18.jar is the file to move to ${knime.root}/plugins

(At the time of writing I put the jar at http://cardioserve.nantes.inserm.fr/~lindenb/knime/fasta/ )

open knime, the new nodes are now displayed in the Node Repository

Node Repository

You can now use the nodes, the code is displayed in the documentation:

Workbench

That's it,

Pierre

02 February 2015

Listing the 'Subject' Sequences in a BLAST database using the NCBI C++ toolbox. My notebook.

In my previous post (http://plindenbaum.blogspot.com/2015/01/filtering-fasta-sequences-using-ncbi-c.html) I've built an application filtering FASTA sequences using the
NCBI C++ toolbox (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/). Here, I'm gonna write a tool listing the 'subject' sequences in a BLAST database.

This new application ListBlastDatabaseContent takes only one argument '-db', the name of the blast database :

void ListBlastDatabaseContent::Init()
    {
    (...)
    /* argument for name */
    arg_desc->AddDefaultKey(
            "db",/* name */
            "database",/* synopsis */
            "Blast database",/* comment */
            CArgDescriptions::eString, /* argument type */
            "" /* default value*/
            );
    (...)
    }

in ListBlastDatabaseContent::Run() we get the name of the database

(...)
string database_name(  this->GetArgs()["db"].AsString() );
(...)

and we open a new CSearchDatabase object ("Blast Search Subject")

CSearchDatabase* searchDatabase = new CSearchDatabase(
    database_name, /* db name */
    CSearchDatabase::eBlastDbIsNucleotide /* db type */
    );

We get a handle to the seqdb, "it defines access to the database component by calling methods on objects which represent the various database files, such as the index, header, sequence, and alias files".

CRef<CSeqDB> seqdb= searchDatabase->GetSeqDb();

We get an iterator to the seqdb database http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCSeqDBIter.html:

CSeqDBIter  iter = seqdb->Begin();

While this iterator is valid, we retrieve and print the DNA sequence and the header associated to the iterator.

while(iter)
    {
    string output;
    TSeqRange range; /* entire sequence */
    seqdb->GetSequenceAsString(iter.GetOID(),output,range); 
    CRef< CBlast_def_line_set > def_line= seqdb->GetHdr(iter.GetOID());
    CBlast_def_line_set:: Tdata::const_iterator r = def_line->Get().begin();
    while(r!= def_line->Get().end())
        {
        cout << ">" << (*r)->GetTitle()  << endl;
        cout << output << endl;
        ++r;
        }
    ++iter;
    }

All in one:

#include <memory>
#include <limits>
#include "corelib/ncbiapp.hpp"
#include <algo/blast/api/blast_types.hpp>
#include <algo/blast/api/sseqloc.hpp>
#include <algo/blast/api/blast_aux.hpp>
#include <algo/blast/api/blast_options_handle.hpp>
#include <algo/blast/api/blast_results.hpp>
#include <algo/blast/api/local_blast.hpp>
#include <ctype.h>



USING_NCBI_SCOPE;
using namespace blast;

/**
 *  Name
 *      ListBlastDatabaseContent
 *  Motivation:
 *      Print content of blast database
 *  Author:
 *      Pierre Lindenbaum PhD 2015
 *
 */
class ListBlastDatabaseContent : public CNcbiApplication /* extends a basic NCBI application defined in c++/include/corelib/ncbiapp.hpp */
    {
    public:

        /* constructor, just set the version to '1.0.0'*/
        ListBlastDatabaseContent()
            {
            CRef<CVersion> version(new CVersion());
            version->SetVersionInfo(1, 0, 0);
            this->SetFullVersion(version);
            }
            
            
        /* called to initialize rge program.
        *  The default behavior of this in 'CNcbiApplication' is "do nothing".
        *  Here, we set the command line arguments.
        */
        virtual void Init()
            {
            CArgDescriptions* arg_desc = new CArgDescriptions ; /* defined in /c++/include/corelib/ncbiargs.hpp */
            arg_desc->SetUsageContext(
                GetArguments().GetProgramBasename(),
                __FILE__
                );
            
         
            /* argument for name */
           arg_desc->AddDefaultKey(
                    "db",/* name */
                    "database",/* synopsis */
                    "Blast database",/* comment */
                    CArgDescriptions::eString, /* argument type */
                    "" /* default value*/
                    );
            
            /* push this command args */
            SetupArgDescriptions( arg_desc );
            }   
        
        /* class destructor */
        virtual ~ListBlastDatabaseContent()
            {
            }
        
        /* workhorse of the program */
        virtual int Run()
            {
            string database_name(  this->GetArgs()["db"].AsString() );
            /* Blast Search Subject.  http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCSearchDatabase.html#details */
            CSearchDatabase* searchDatabase = 0;
            
        
            /* initialize search database */
            searchDatabase = new CSearchDatabase(
                database_name, /* db name */
                CSearchDatabase::eBlastDbIsNucleotide /* db type */
                );
            
            /* This class provides the top-level interface class for BLAST database users. It 
            defines access to the database component by calling methods on objects which represent
            the various database files, such as the index, header, sequence, and alias files. 
            http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCSeqDB.html */
            CRef<CSeqDB> seqdb= searchDatabase->GetSeqDb();                 
            
            /* Small class to iterate over a seqdb database. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCSeqDBIter.html#details */
            CSeqDBIter  iter = seqdb->Begin();
            
            while(iter)
                {
                /* we put the sequence here */
                 string output;
                 /* retrieve the entire sequence */
                 TSeqRange range;
                /* This method gets the sequence data for an OID, converts it to a human-readable encoding (either Iupacaa for protein, or Iupacna for nucleotide), and returns it in a string. This is equivalent to calling the three-argument versions of this method with those encodings. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCSeqDB.html */
                seqdb->GetSequenceAsString(iter.GetOID(),output,range); 
                /* Represents ASN.1 type Blast-def-line-set defined in file blastdb.asn http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCBlast__def__line__set.html */
                CRef< CBlast_def_line_set > def_line= seqdb->GetHdr(iter.GetOID());
                CBlast_def_line_set:: Tdata::const_iterator r = def_line->Get().begin();
                while(r!= def_line->Get().end())
                    {
                    cout << ">" << (*r)->GetTitle()  << endl;
                    cout << output << endl;
                    ++r;
                    }

                ++iter;
                }
            if( searchDatabase != 0) delete searchDatabase;
            return 0;
            }
    };

int main(int argc,char** argv)
    {
    return ListBlastDatabaseContent().AppMain(
        argc,
        argv,
        0, /* envp Environment pointer */
        eDS_ToStderr,/* log message. In /c++/include/corelib/ncbidiag.hpp  */
        0, /* Specify registry to load, as per LoadConfig() */
        "loadblastdb" /* Specify application name */
        );
    }

Testing

get some fasta sequences, remove the headers and create a blast database.

$ curl  "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=281381068,281381067,281381066,281381065&rettype=fasta" |\
        awk '/^>/ {printf(">%d\n",NR);next;} {print;}' > sequences.fa

$ makeblastdb -in sequences.fa -dbtype nucl

run the program:

$ loadblastdb -db sequences.fa
>1
GATCGCGGGTATCTTCGATGTAGGGCGAGTCGCGTCACCCTGACCAACCCCTCGAGGTGGTCCTCGTGGAAGGAGGAAGCGCAGTCGACGAAGCGGAAGATGTGGCTGAGCCATCTCGTCGTGTCGGCCGCGAGACGACCCGCGAGTTTCTGCGCGTGTGACTTTGGTAGCGCTACTGCGTAAGCCGAGA
>6
CAGAAACTCGCGGGTCGTCTCGCGGCCGACACGACGAGATGGCTCAGCCACATCTTCCGCTTCGTCGACTGCGCTTCCTCCTTCCACGAGGACCACCTCGAGGGGTTGGTCAGGGTGACGCGACTC
>10
ACCGAAACGGCAAAATTGGCCTTAGTCACGCACACTATCTCGGCTTACGCAGTAGCGCTACCAAAGTCACACGCGCAGAAACTCGCGGGTCGTCTCGCGGCCGACACGACGAGATGGCTCAGCCACATCTTCCGCTTCGTCGACTGCGCTTCCTCCTTCCACGAGGACCACCTCGAGGGGTTGGTCAGGGTGACGCGACTCGCCCTACATCGAAGATACCCGCGATC
>16
GATCGAACGATAAACTTCTGCTATCGGTTCCTCAGCAATCAGTATAACGAGCGGCACATGTATTGCGGCATATTGCAAAAAACATACCTATAAAGAAGGCTAAACTGATTGATTCAAATTAACTTGAGAATACACGATGTTGATGTTCAGTCGAAACAAATCGCTCGCGTATTGTGATC

That's it,
Pierre

YOKOFAKUN