Center for the Study of Human Polymorphisms: Week 3
In my previous post I showed how I used apache velocity to generate some 'C' code for the Operon project based on BerkeleyDB. I also generated the Makefiles and some Lex and Yacc files to create a simple language to query each database. Today I've compiled and linked my first applications. Each application will use my simple language to query each database without having to write a new piece of code for each new kind of query.
For example, the database called 'snpIds' contains a consecutive number of structures defined as :
typedef struct snpIds_t
{
char* featureid;
char* rs_number;
}snpIds,*snpIdsPtr;
{
char* featureid;
char* rs_number;
}snpIds,*snpIdsPtr;
I can now query this database like this
snpiddump -q "OR( EQ({rs_number},\"rs10043098\"), EQ({rs_number},\"rs2377171\") ) " -f xml
(OK, the syntax looks ugly, but this design was the simplest way to avoid the shit/reduce conflicts in the yacc parser).The query part is broken into tokens by the lexer and interpreted by the yacc parser. The parser build a Parse Tree which can be drawn like this:
"rs2377171"
/
EQUALS
/ \
/ {rs_number}
--OR
\ {rs_number}
\ /
EQUALS
\
"rs10043098"
This tree is then evaluated versus each record in the database. When a record matches, it is printed out in xml|json|text. e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<op:operon xmlns:op="http://operon.cng.fr">
<op:SnpIds>
<op:featureid>101051105133288</op:featureid>
<op:rs_number>rs10043098</op:rs_number>
</op:SnpIds>
<op:SnpIds>
<op:featureid>101161015120774</op:featureid>
<op:rs_number>rs2377171</op:rs_number>
</op:SnpIds>
</op:operon>
<op:operon xmlns:op="http://operon.cng.fr">
<op:SnpIds>
<op:featureid>101051105133288</op:featureid>
<op:rs_number>rs10043098</op:rs_number>
</op:SnpIds>
<op:SnpIds>
<op:featureid>101161015120774</op:featureid>
<op:rs_number>rs2377171</op:rs_number>
</op:SnpIds>
</op:operon>
Again, most of the code was written using a velocity template [here].
Pierre
No comments:
Post a Comment