YOKOFAKUN: Center for the Study of Human Polymorphisms: Week 3

In my previous post I showed how I used apache velocity to generate some 'C' code for the Operon project based on BerkeleyDB. I also generated the Makefiles and some Lex and Yacc files to create a simple language to query each database. Today I've compiled and linked my first applications. Each application will use my simple language to query each database without having to write a new piece of code for each new kind of query.

For example, the database called 'snpIds' contains a consecutive number of structures defined as :

typedef struct snpIds_t
  {
  char* featureid;
  char* rs_number;
  }snpIds,*snpIdsPtr;

I can now query this database like this

snpiddump -q "OR( EQ({rs_number},\"rs10043098\"), EQ({rs_number},\"rs2377171\") ) " -f xml 

(OK, the syntax looks ugly, but this design was the simplest way to avoid the shit/reduce conflicts in the yacc parser).The query part is broken into tokens by the lexer and interpreted by the yacc parser. The parser build a Parse Tree which can be drawn like this:

             "rs2377171"
            /
      EQUALS
     /      \
    /       {rs_number}
--OR
    \        {rs_number}
     \      /
      EQUALS
            \
             "rs10043098"

This tree is then evaluated versus each record in the database. When a record matches, it is printed out in xml|json|text. e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<op:operon xmlns:op="http://operon.cng.fr">
<op:SnpIds>
  <op:featureid>101051105133288</op:featureid>
  <op:rs_number>rs10043098</op:rs_number>
</op:SnpIds>
<op:SnpIds>
  <op:featureid>101161015120774</op:featureid>
  <op:rs_number>rs2377171</op:rs_number>
</op:SnpIds>
</op:operon>

Again, most of the code was written using a velocity template [here].

Pierre

YOKOFAKUN

19 September 2008

Center for the Study of Human Polymorphisms: Week 3

No comments:

Post a Comment