17 September 2008

Generating C code with apache-velicity

I'm currently working on Operon ( http://regulon.cng.fr/) a database developped by Mario Foglio at The National Center of Genotyping. The whole database/storage is developped around the Berkeley C API and I've been asked to write a clean 'C' API to access the data. Most data are stored with C structures and I wanted to quickly write the methods to:
* create a new instance of each structure
* free the resources allocated by each structure
* create a vector of those structures with the common methods (addElement, removeElement, getSize, clear, etc...)
* etc...

I wrote a description of a few structures in xml. Something like this:

<?xml version="1.0" encoding="UTF-8"?>
<op:table name="SnpIds">
SNPIDS Berkeley Hash db: stores all SNP ids. The key for this
database is the acn, and
duplicate acn keys are allowed.
<op:column name="fid" type="char*">
<op:description>fid: SNP feature id</op:description>
<op:column name="acn" type="char*">
<op:description>acn: SNP accession</op:description>

To generate my C code I've first tried to use xslt but I later found it too ugly.
I then looked for something that could have looked like a standalone version of the java server page (jsp). I didn't find one ( it would have been nice to re-use the custom-tags).
I then tried apache-velocity ( http://velocity.apache.org/), a java processor, and this is the technology I used.

OK, this kind of C structures can be described as a java interface:
public interface CField
public String getName();
public String getType();

public interface CStructure
public Colllection<CField> getFields();
public String getName();

Those objects are created by parsing the XML description of the structures and are then associated with a string in the 'context' of velocity. (source code [here]).
CStructure mystructure;
The velocity engine is then called, it uses the object reflection to resolve the velocity statements. For example the following template:
 typedef struct $struct.typedef
#foreach($field in ${struct.fields})
* ${field.name}
* ${field.description}
${field.type} ${field.name};
} ${struct.name}, *${struct.name}Ptr;
will generate the C header for this structure.
The velocity templates generating the *.c and the *.h are available [here] and [here] (Warning this is a work in progress)

But that is not all: I also wanted to query each berkeley database without having to re-write a new code for each new kind of query. So I've used velocity to generate a Flex/lex and Bison/yacc files. Those tools then generate a simple parser to build a concrete syntax tree and then searching each database.
YNodePtr search = mydatabaseParseQuery("AND(LT([chromEnd],10000),GT([chromStart],100))");
myDatabaseArray array= myDatabaseSearch(search);

The velocity templates for flex and bison are available [here] and [here] (again, warning , this is a work in progress)

That's it


1 comment:

Anonymous said...

I did not take much time to look at your flex and bison codes but I am already impressed and pleased you found an elegant alternative to manual (re)coding. Congrats Pierre