19 September 2008

Center for the Study of Human Polymorphisms: Week 3

In my previous post I showed how I used apache velocity to generate some 'C' code for the Operon project based on BerkeleyDB. I also generated the Makefiles and some Lex and Yacc files to create a simple language to query each database. Today I've compiled and linked my first applications. Each application will use my simple language to query each database without having to write a new piece of code for each new kind of query.

For example, the database called 'snpIds' contains a consecutive number of structures defined as :

typedef struct snpIds_t
char* featureid;
char* rs_number;

I can now query this database like this
snpiddump -q "OR( EQ({rs_number},\"rs10043098\"), EQ({rs_number},\"rs2377171\") ) " -f xml

(OK, the syntax looks ugly, but this design was the simplest way to avoid the shit/reduce conflicts in the yacc parser).The query part is broken into tokens by the lexer and interpreted by the yacc parser. The parser build a Parse Tree which can be drawn like this:

/ \
/ {rs_number}
\ {rs_number}
\ /

This tree is then evaluated versus each record in the database. When a record matches, it is printed out in xml|json|text. e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<op:operon xmlns:op="http://operon.cng.fr">

Again, most of the code was written using a velocity template [here].


No comments: