YOKOFAKUN: February 2010

23 February 2010

A Mysql user defined function (UDF) for Gene Ontology (GO)

In this post I'll create a Mysql Defined function (UDF) answering if a defined GO term is a descendant of another term. This post is not much different from two previous posts I wrote here:

Here I built a binary file containing an array of pairs(term,parent_term) of GO identifiers from the XML/RDF file o_daily-termdb.rdf-xml.gz. This file is then used by the mysql UDF to find all the ancestors of a given term. The source code is available here:

http://code.google.com/p/lindenb/source/browse/trunk/proj/go_udf/

Building the database

The libxml API is used to load go_daily-termdb.rdf-xml.gz. For each go:term, the parent terms are searched and each pair(term,parent-term) is saved into the array.

static void scanterm(xmlNode *term)
{
xmlAttrPtr rsrc;
xmlNode *n=NULL;
xmlChar * acn=NULL;
for(n=term->children; n!=NULL; n=n->next)
{
if (n->type != XML_ELEMENT_NODE) continue;
if(strcmp(n->name,"accession")==0 &&
n->ns!=NULL && n->ns->href!=NULL &&
strcmp(n->ns->href,GO_NS)==0)
{
acn= xmlNodeGetContent(n);
break;
}
}
if(acn==NULL) return;

for(n=term->children; n!=NULL; n=n->next)
{
if (n->type != XML_ELEMENT_NODE) continue;
if(strcmp(n->name,"is_a")==0 &&
n->ns!=NULL && n->ns->href!=NULL &&
strcmp(n->ns->href,GO_NS)==0
)
{
xmlChar* is_a=xmlGetNsProp(n,"resource",RDF_NS);
if(is_a!=NULL)
{
char* p=strstr(is_a,"#GO:");
if(p!=NULL)
{
++p;
termdb.terms=(TermPtr)realloc(termdb.terms,sizeof(Term)*(termdb.n_terms+1));
if(termdb.terms==NULL)
{
fprintf(stderr,"out of memory\n");
exit(EXIT_FAILURE);
}
strncpy(termdb.terms[termdb.n_terms].parent,p,MAX_TERM_LENGTH);
strncpy(termdb.terms[termdb.n_terms].child,acn,MAX_TERM_LENGTH);
++termdb.n_terms;
}
xmlFree(is_a);
}
}
}
xmlFree(acn);
}

When the RDF file has been read, the content of the array is sorted by term and saved to a file

qsort(termdb.terms,termdb.n_terms,sizeof(Term),cmp_term);
out= fopen(argv[1],"w");
if(out==NULL)
{
fprintf(stderr,"Cannot open %s\n",argv[1]);
return -1;
}
fwrite(&(termdb.n_terms),sizeof(int),1,out);
fwrite(termdb.terms,sizeof(Term),termdb.n_terms,out);
fflush(out);
fclose(out);

Initializing the mysql UDF

When the mysql UDF is inited the binary file is loaded into memory (error handling ignored)

my_bool go_isa_init(
UDF_INIT *initid,
UDF_ARGS *args,
char *message
)
{
TermDBPtr termdb;
FILE* in=NULL;

initid->maybe_null=1;
initid->ptr= NULL;

termdb=(TermDBPtr)malloc(sizeof(TermDB));

in=fopen(GO_PATH,"r");
fread(&(termdb->n_terms),sizeof(int),1,in);
termdb->terms=malloc(sizeof(Term)*(termdb->n_terms));
fread(termdb->terms,sizeof(Term),termdb->n_terms,in);
fclose(in);
initid->ptr=(void*)termdb;
return 0;
}

Disposing the mysql UDF

When the UDF is disposed, we just free the memory

/* The deinitialization function */
void go_isa_deinit(UDF_INIT *initid)
{
/* free the memory **/
if(initid->ptr!=NULL)
{
TermDBPtr termdb=(TermDBPtr)initid->ptr;
if(termdb->terms!=NULL) free(termdb->terms);
free(termdb);
initid->ptr=NULL;
}
}

invoking the UDF

The UDF itself will scan the GO directed tree and will get all the ancestors of a given term:

long long go_isa(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error)
{
(...)
index=termdb_findIndexByName(termdb,term);
if(index==-1)
{
return 0;
}
return recursive_search(termdb,index,parent);
}

static int recursive_search(const TermDBPtr db,int index, const char* parent)
{
int rez=0;
int start=index;
int parent_idx=0;

if(start<0 || start>=db->n_terms) return 0;
if(strcmp(db->terms[index].child,parent)==0) return 1;
while(index < db->n_terms)
{
if(strcmp(db->terms[index].child,db->terms[start].child)!=0) break;
if(strcmp(db->terms[index].parent,parent)==0) return 1;
parent_idx= termdb_findIndexByName(db,db->terms[index].parent);
rez= recursive_search(db,parent_idx,parent);
if(rez==1) return 1;
++index;
}
return 0;
}

As the terms have been ordered by their names, a binary_search is used to find the index of a given term:

static int lower_bound(const TermDBPtr termsdb, const char* name)
{
int low = 0;
int len= termsdb->n_terms;

while(len>0)
{
int half=len/2;
int mid=low+half;
if( strncmp(termsdb->terms[mid].child,name,MAX_TERM_LENGTH)<0)
{
low=mid;
++low;
len=len-half-1;
}
else
{
len=half;
}
}
return low;
}

static int termdb_findIndexByName(const TermDBPtr termsdb,const char* name)
{
int i=0;
if(name==NULL || termsdb==NULL || termsdb->terms==NULL || termsdb->n_terms==0) return -1;
i= lower_bound(termsdb,name);
if(i<0 || i >= termsdb->n_terms || strcmp(termsdb->terms[i].child,name)!=0) return -1;
return i;
}

Compiling

On my laptop, the UDF was compiled using

gcc -shared  -DGO_PATH='"/tmp/terms.bin"' -I /usr/include -I /usr/local/include -I /usr/include/mysql  -o /usr/lib/go_udf.so src/go_udf.c

Creating the function

mysql> create function go_isa RETURNS INTEGER SONAME 'go_udf.so';

Invoking the UDF function

Finf all the GO terms that are a descendant of GO:0016859 (cis-trans isomerase activity):

mysql> select LEFT(T.name,20) as name,T.acc from go_latest.term as T where go_isa(T.acc,"GO:0016859");
+----------------------+------------+
| name | acc |
+----------------------+------------+
| peptidyl-prolyl cis- | GO:0003755 |
| retinal isomerase ac | GO:0004744 |
| maleylacetoacetate i | GO:0016034 |
| cis-trans isomerase | GO:0016859 |
| cis-4-[2-(3-hydroxy) | GO:0018839 |
| trans-geranyl-CoA is | GO:0034872 |
| carotenoid isomerase | GO:0046608 |
| 2-chloro-4-carboxyme | GO:0047466 |
| 4-hydroxyphenylaceta | GO:0047467 |
| farnesol 2-isomerase | GO:0047885 |
| furylfuramide isomer | GO:0047907 |
| linoleate isomerase | GO:0050058 |
| maleate isomerase ac | GO:0050076 |
| maleylpyruvate isome | GO:0050077 |
| retinol isomerase ac | GO:0050251 |
+----------------------+------------+
15 rows in set (1.27 sec)

Disposing the UDF function

drop function go_isa;

That's it
Pierre

18 February 2010

eXist: The Open Source Native XML Database : My notebook

In a previous post, I've played with Oracle's BerkeleyDB-XML. Here, I used with eXist-db, an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.

Download & Install

wget http://downloads.sourceforge.net/project/exist/Stable/1.4/eXist-setup-1.4.0-rev10440.jar
java -jar eXist-setup-1.4.0-rev10440.jar
export EXIST_HOME=PATH_TO_EXIST/eXist

And tha'ts it: it was far more easy than installing (compiling...) BerkeleyDB-XML.

Starting the Server

eXist/bin/startup.sh

Using locale: en_US.UTF-8
18 Feb 2010 15:31:32,579 [main] INFO (JettyStart.java [run]:90) - Configuring eXist from EXIST/eXist/conf.xml
18 Feb 2010 15:31:32,580 [main] INFO (JettyStart.java [run]:91) -
18 Feb 2010 15:31:32,580 [main] INFO (JettyStart.java [run]:92) - Running with Java 1.6.0_07 [Sun Microsystems Inc. (Java HotSpot(TM) Server VM) in /usr/local/package/jdk1.6.0_07/jre]
(...)

Inserting the data

First using the web console, I've created a 'collection' named '/db/dbsnp'.
I've then downloaded about 1000 XML documents from dbsnp:

for S in `mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select right(name,length(name)-2) from snp130 limit 1000' -N`
do
wget -O rs${S}.xml "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=${S}&retmode=xml"
done

Those document have been inserted in the XML database:

EXIST/eXist/bin/client.sh -u admin --password=mypassword -s -m 'db/dbsnp' -p rs*.xml
(...)
parsed 7704 bytes in 33ms.
storing document rs10.xml (1 of 1) ...done.
parsing 16306 bytes took 36ms.

parsed 16306 bytes in 36ms.

Using XQUERY

The following XQuery search for all the SNPs in the database '/db/dbsnp' having a heterozygosity greater than 0.49. For each SNP it prints its name, its sequence and the position on the reference genome.

xquery version "1.0";

declare namespace s="http://www.ncbi.nlm.nih.gov/SNP/docsum";

<MyListOfSnp>
{
for $x in collection("/db/dbsnp")/s:ExchangeSet/s:Rs
where data($x/s:Het/@value)>0.49
return <SNP>
<name>rs{data($x/@rsId)}</name>
<sequence>{data($x/s:Sequence/s:Seq5)}[{data($x/s:Sequence/s:Observed)}]{data($x/s:Sequence/s:Seq3)}</sequence>

{
for $as in $x/s:Assembly
where $as/@groupLabel="reference" return

for $comp in $as/s:Component return

for $maploc in $comp/s:MapLoc
return
<map>
<chromosome>chr{data($comp/@chromosome)}</chromosome>
<position>{data($maploc/@physMapInt)}</position>
</map>
}
</SNP>
}
</MyListOfSnp>

Executing the query:

EXIST/eXist/bin/client.sh -u admin --password=mypassword -F input.xquery

Result:

<MyListOfSnp>
<SNP>
<name>rs10000300</name>
<sequence>ATCAAATACCCAAGCAAAGATTTACATTCAAATCTGTTTACTGAAGTTCTATTTATAATACAATGCAATGAACATAATAGTATATATTTACACGTAATGTAATAAACACAAATATTCAATGGTATAAAAATGGTCAATAAATCGTGGCATAGCCACAGCTTAGAGTACCTGTTTAATGTTCTCAGCTATTTTAACTTTGCTAAATAATATTTAAAGATATGcggtagtcccccttcatctgaggaggacctgttccaagacccccagtggatgcctgaaacctctgatagtaatgaaccctatatatactgttttttcctatacatatttacatatataatacatacctatgattaagtttaatttataaattaggcacagtaagagattaacaacaacaataataaaatgtaacaattatagcaatactctaataataaagttatgtgagtgtggtctctctctctgtctcaaaatatcatactgtatgcctctatttt[G/T]ggaatacagttgacaacgggtaactgaaaccgagaaaagtgaaactgcagatgggggctgactactgTATATGAAAATTAAACAATCagccaggcatggtggctcacgcctgtaataccagcactttgggaggccgaggcgggaggatcacgaggtcaggagatcgagaccacggtgaaaccccgtctctattaaaaatacaaaaaaaaaattagccgggtacagtggcaggcacctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggcagagcttgcagtgagccgagatcgcgccactgcactccagcctgggcgacagagcaagactctgtctcaaaaaaaaaaaaaaaaaaaaaaaaaaGGAAAAGAAAATTAAACAACCAAACAAAATCAGAGTAAATACACCATGTTAATTCTGGTTATATTTGGATTGTGGGCTTATGGGTAGATTTTGTTACATTTTTCTATAATTTCC</sequence>
<map>
<chromosome>chr4</chromosome>
<position>40161303</position>
</map>
</SNP>
<SNP>
<name>rs10000307</name>
<sequence>GTTCAAAGACTCCTGATTAGAGTGTCCTTTCTATAACCAATCTTGTTCCTTAAAACATCTTGAATGATTTGATCTCAGATCCCCTGAAGGGACTGCTGAGATCATCTGCACCAATCCTAAAAAAAAAAATCTTTCATGCCCAAACCCTTAGCAAAGCTAGTTTCTTGTGGGACTCTTAATCCCTCTTATCCTGCTTGACACAGAGGTGCTCACCTGCTGTGCATCAGAAACACTATGGATACTTCTTGAAAGTGCCTGCAACAGAGATACTGATGCATCTGTTGTGGGGTGGGCCCTAGGTATCAGTAATTTAAAAAGTTTTTTAAAATACG[C/T]CCCAGATAATTCTGATTGCTTGTAAATGGCAAAGGTTGAGAAGCACTGCTGGAAGCTTTTGAGCTCCTGTTGGGTAAGTTCAAGCGACAGGAGAATCTCATAGTGATCATAAAACAGCACTCTGAATTCTTGGAGAAACCCAGACTCATCTTATGTGACTAATTTCCTTAATGTGTACCCCAAAACTATCCTAGCGCGTTCACAGGTACACCAGGTAATGCTATTCTGATTGAGCACCCAAGAGTCTC</sequence>
<map>
<chromosome>chr4</chromosome>
<position>188340930</position>
</map>
</SNP>
<SNP>
<name>rs1000031</name>
<sequence>CTTTGAGGATCTCGATGAAAAATCTGCACCTCTCCCAGAAAAATGCACCTCTGCACAGGTTCACAGATGTCTGCATACAATTTCAGGGTTCTCAGACCCTGAAGGCCACCAAGGGACCCAAGTACATGAGCCTTACACAGCACAACCTAAATCGTCAATGGCAATGTCTCAGGAGTGTAGGACAGTGACTGCCTCTGTAAGACCATCAGCACAGCCATGGCCACACATGTTGTCTGGAGGATCAGGTGGCCTTTTTCTGTGGCTTTTGAGGTTGAGGCTGGGTACCCTTGTGGCTAATGCATAATGCCAGGATGGCCAATAAAGACACCATAAAAATTCCCTGCCGTGTGCCTGACACTGGACAGATTTAATCTCCAGGTCTTCTGGGAACCCCGCAgaggcaggggctgttttctcattttactgatggaaactgaggctcaaggaagtgaaggaatttgtttcaagtcccaggcagtacca[C/T]gaacatgggatttgaaatcacgcaagtctgacACGCAAACCTTGGTTCTTTCCTTTTTCCCTTCTCACAGAGGGTGCTTTTCGCTTCCCGGAAGCTGGCAGGGAGTTCCTCTAAAGCGCAGGTTGGAGTGGTCAGAAGGGAGCGAACTGACAGCACGAGGAAGGCTCAGCGCATGCCAGCTCCACTCACGGGAAATGACTCACTGCAGCCCTGCTGCTCTCGGGCTCCGGGGGACACATCCACATTTCCTGTATCTCGGCTAGAGCCTTGGGCAGTGTGAGCTGGCAGGGCAGATCGCTGAAGGCGGCTAGAGATAGAAAACCACCCAGCTCTGCATCCTGAGACAAAGAAGCCTTTCCCTGGGCTCATATGATAGAGGTACGTTGCctctgggcctcagttttgccatctgtaaaatagggTGAAGGTCAGATTAGATTGGGCATATTCAGTGTG</sequence>
<map>
<chromosome>chr18</chromosome>
<position>44615438</position>
</map>
</SNP>
(...)
</MyListOfSnp>

That's it
Pierre

17 February 2010

The path from EgonWillighagen to Jandot : Neo4j , a graph API for java: my notebook.

neo4j "Neo4j is a graph database. It is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. A graph (mathematical lingo for a network) is a flexible data structure that allows a more agile and rapid style of development.".
In the current post, I'll use the neo4j API to load a set of pubmed entries and find the shortest path between two authors.
First, a few constants are defined:

private static final String NODE_TYPE="node-type";
private static final String TYPE_ARTICLE="article";
private static final String TYPE_AUTHOR="author";
private static final String PROPERTY_PMID="pmid";
private static final String PROPERTY_NAME="name";
private static final String PROPERTY_TITLE="title";

We also need a Neo4J embedded graph as well as two indexes that will be used to quickly find if an article or an author have already been inserted in the graph:

//this graph will be stored in /tmp/neo4j
private GraphDatabaseService graphDB =new EmbeddedGraphDatabase("/tmp/neo4j");
private IndexService findByPmid= new LuceneIndexService( graphDB );
private IndexService findByName= new LuceneIndexService( graphDB );

when the program exists, all those objects have to be closed:

private void close()
{
if(findByPmid!=null)
{
findByPmid.shutdown();
findByPmid=null;
}
if(findByName!=null)
{
findByName.shutdown();
findByName=null;
}
if(graphDB!=null)
{
graphDB.shutdown();
}
graphDB=null;
}

A Stax parser is used to read the XML pubmed entries. For this test, I've loaded a bunch of articles from 'Nature' and from the Biogang. Each time a new pubmed identifier (PMID) is found, the Node is searched in the findByPmid index, else a new node is created:

Transaction txn=graphDB.beginTx();
(...)
if(name.equals("PMID") && article==null)
{
Integer pmid=new Integer(reader.getElementText());
article= findByPmid.getSingleNode(PROPERTY_PMID, pmid);

//article was not already in the graph
if(article==null)
{
article= this.graphDB.createNode();
article.setProperty(NODE_TYPE, TYPE_ARTICLE);
article.setProperty(PROPERTY_PMID, pmid);
findByPmid.index(article, PROPERTY_PMID, pmid);
}
}
(..)
txn.success();

Likewise, each time a new pubmed author (PMID) is found, the Node is searched in the findByName index, else a new node is created.

while(!(evt=reader.nextEvent()).isEndDocument())
{
if(evt.isEndElement())
{
if(initials.isEmpty() && !firstName.isEmpty()) initials=""+firstName.charAt(0);
String s= initials+" "+middle+" "+lastName+" "+suffix;
String name= s.replaceAll("[ ]+", " ").trim();
if(name.isEmpty()) return null;
Node author=findByName.getSingleNode(PROPERTY_NAME, name);
if(author==null)
{
author= this.graphDB.createNode();
author.setProperty(NODE_TYPE, TYPE_AUTHOR);
author.setProperty(PROPERTY_NAME, name);
findByName.index(author, PROPERTY_NAME, name);
}
return author;
}
if(!evt.isStartElement()) continue;
String tag= evt.asStartElement().getName().getLocalPart();
String content= reader.getElementText().trim();
if(tag.equals("LastName"))
{
lastName= content;
}
else if(tag.equals("FirstName") || tag.equals("ForeName"))
{
firstName= content;
}
else if(tag.equals("Initials"))
{
initials= content;
}
else if(tag.equals("MiddleName"))
{
middle= content;
}
else if(tag.equals("CollectiveName"))
{
return null;
}
else if(tag.equals("Suffix"))
{
suffix= content;
}

For each author in an Article, a Relationship is created:

author.createRelationshipTo(article, PubmedRelations.IS_AUTHOR_OF);

Now we can find the Path between Jan Aerts and Egon Willighagen (ok, those author could be some homonyms, we need a unique identifier for the Authors ! ).

txn=this.graphDB.beginTx();
Node startAuthor= findByName.getSingleNode(PROPERTY_NAME, "J Aerts");
Node endAuthor= findByName.getSingleNode(PROPERTY_NAME, "EL Willighagen");
SingleSourceShortestPat<Integer> pathBFS;
pathBFS = new SingleSourceShortestPathBFS(
startAuthor,
Direction.BOTH,
PubmedRelations.IS_AUTHOR_OF
);

for(Node n: pathBFS.getPathAsNodes(endAuthor))
{
echo(n);
}

txn.finish();

Result:

Source code

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import java.util.List;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

import org.neo4j.graphalgo.shortestpath.SingleSourceShortestPath;
import org.neo4j.graphalgo.shortestpath.SingleSourceShortestPathBFS;
import org.neo4j.graphdb.Direction;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.index.IndexService;
import org.neo4j.index.lucene.LuceneIndexService;
import org.neo4j.kernel.EmbeddedGraphDatabase;

public class PubmedGraph
{
private GraphDatabaseService graphDB;
private IndexService findByPmid;
private IndexService findByName;

private static final String NODE_TYPE="node-type";
private static final String TYPE_ARTICLE="article";
private static final String TYPE_AUTHOR="author";
private static final String PROPERTY_PMID="pmid";
private static final String PROPERTY_NAME="name";
private static final String PROPERTY_TITLE="title";

private static enum PubmedRelations
implements RelationshipType
{
IS_AUTHOR_OF
}

private PubmedGraph()
{

}
private void open()
{
close();
graphDB=new EmbeddedGraphDatabase("/tmp/neo4j");
findByPmid= new LuceneIndexService( graphDB );
findByName= new LuceneIndexService( graphDB );
}
private void close()
{
if(findByPmid!=null)
{
findByPmid.shutdown();
findByPmid=null;
}
if(findByName!=null)
{
findByName.shutdown();
findByName=null;
}
if(graphDB!=null)
{
graphDB.shutdown();
}
graphDB=null;
}

private boolean read(InputStream in)
{
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.FALSE);
xmlInputFactory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
xmlInputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.TRUE);
XMLEventReader reader=null;

Transaction txn=null;
try {
reader= xmlInputFactory.createXMLEventReader(in);
txn = graphDB.beginTx();

Node article=null;
while(reader.hasNext())
{
XMLEvent event = reader.nextEvent();
if(event.isStartElement())
{
StartElement e=event.asStartElement();
String name=e.getName().getLocalPart();
if(name.equals("PubmedArticle"))
{
article=null;
}
else if(name.equals("ArticleTitle") && article!=null && !article.hasProperty(PROPERTY_TITLE))
{
article.setProperty(PROPERTY_TITLE, reader.getElementText());
}
else if(name.equals("PMID") && article==null)
{
Integer pmid=new Integer(reader.getElementText());

article= findByPmid.getSingleNode(PROPERTY_PMID, pmid);

if(article==null)
{
article= this.graphDB.createNode();
article.setProperty(NODE_TYPE, TYPE_ARTICLE);
article.setProperty(PROPERTY_PMID, pmid);
findByPmid.index(article, PROPERTY_PMID, pmid);
}
}
else if(name.equals("Author") && article!=null)
{
Node author= author(reader);
if(author!=null)
{
author.createRelationshipTo(article, PubmedRelations.IS_AUTHOR_OF);
}
}
}
else if(event.isEndElement())
{
EndElement e=event.asEndElement();
String name=e.getName().getLocalPart();
if(name.equals("PubmedArticle"))
{
article=null;
}
}
}
txn.success();
return true;
}
catch (Exception e)
{
e.printStackTrace();
return false;
}
finally
{
if(txn!=null) txn.finish();
if(reader!=null) try {reader.close();}catch(Exception e2){}
}

}
private Node author(XMLEventReader reader) throws IOException,XMLStreamException
{
String lastName="";
String firstName="";
String initials="";
String middle="";
String suffix="";
XMLEvent evt;
while(!(evt=reader.nextEvent()).isEndDocument())
{
if(evt.isEndElement())
{
if(initials.isEmpty() && !firstName.isEmpty()) initials=""+firstName.charAt(0);
String s= initials+" "+middle+" "+lastName+" "+suffix;
String name= s.replaceAll("[ ]+", " ").trim();
if(name.isEmpty()) return null;
Node author=findByName.getSingleNode(PROPERTY_NAME, name);
if(author==null)
{
author= this.graphDB.createNode();
author.setProperty(NODE_TYPE, TYPE_AUTHOR);
author.setProperty(PROPERTY_NAME, name);
findByName.index(author, PROPERTY_NAME, name);
}
return author;
}
if(!evt.isStartElement()) continue;
String tag= evt.asStartElement().getName().getLocalPart();
String content= reader.getElementText().trim();
if(tag.equals("LastName"))
{
lastName= content;
}
else if(tag.equals("FirstName") || tag.equals("ForeName"))
{
firstName= content;
}
else if(tag.equals("Initials"))
{
initials= content;
}
else if(tag.equals("MiddleName"))
{
middle= content;
}
else if(tag.equals("CollectiveName"))
{
return null;
}
else if(tag.equals("Suffix"))
{
suffix= content;
}

else
{
//"###ignoring "+tag+"="+content);
}
}
throw new IOException("Cannot parse Author");
}

public void echo(Node n)
{
System.out.println("nodeId."+n.getId());
for(String p: n.getPropertyKeys())
{
System.out.println(" "+p+":\t"+n.getProperty(p));
}
System.out.println();
}

public void dump()
{
System.out.println("Dump:");
Transaction txn=null;
try {
txn=this.graphDB.beginTx();
for(Node n: this.graphDB.getAllNodes())
{
echo(n);
}
} finally
{
txn.finish();
}

}

private void search()
{
Transaction txn=null;
try {
txn=this.graphDB.beginTx();
Node startAuthor= findByName.getSingleNode(PROPERTY_NAME, "J Aerts");
if(startAuthor==null) { System.err.println("No found;");return;}
Node endAuthor= findByName.getSingleNode(PROPERTY_NAME, "EL Willighagen");

if(endAuthor==null) return;

System.err.println("Waking paths");
SingleSourceShortestPath<Integer> pathBFS;
pathBFS = new SingleSourceShortestPathBFS(
startAuthor,
Direction.BOTH,
PubmedRelations.IS_AUTHOR_OF
);

for(Node n: pathBFS.getPathAsNodes(endAuthor))
{
echo(n);
}

System.err.println("Done");

} finally
{
txn.finish();
}

}

public static void main(String[] args)
{

PubmedGraph app=new PubmedGraph();
try {
int optind=0;

app.open();

while(optind< args.length)
{
InputStream in= new FileInputStream(args[optind++]);
app.read(in);
in.close();
}

app.search();
} catch (Exception e) {
e.printStackTrace();
}
finally
{
if(app!=null) app.close();
}
}
}

That's it
Pierre

15 February 2010

Semantic Web Services with the SADI Framework: my notebook.

At Biohackathon2010 , Mark Wilkinson and Luke McCarthy introduced The SADI Framework. From sadiframework.org:SADI is a framework for discovery of, and interoperability between, distributed data and analytical resources. It combines simple, stateless, GET/POST-based Web Services with standards from the W3C Semantic Web initiative. The objective of SADI is to make it easy for data and analytical tool providers to quickly make their resources available on the Semantic Web with minimal disruption to their usual practices.(...)

SADI Services consume and provide data via simple HTTP POST and GET
SADI Services consume and produce data in RDF format. This allows SADI Services to exploit existing OWL reasoners and SPARQL query engines to enhance interoperability between Services and the interpretation of the data being passed between them
Service interfaces (i.e., Inputs and Outputs) are defined in terms of OWL-DL classes; the property restrictions on these OWL classes define what specific data elements are required by the Service and what data will be provided by the Service, respectively
Input RDF data - data that is compliant with the Input OWL Class - is “decorated” or “annotated” by the service provider to include new properties. These properties will (of course) be a function of the lookup/analytical operations performed by the Web Service.
Importantly, discovery of SADI Services can include searches for the properties the user wants to add to their data. This contrasts with other Semantic Web Service standards which attempt only to define the computational process by which input data is analysed, rather than the properties that process generates between the input and output data. This is KEY to the semantic behaviours of SADI.
SADI Web Services are stateless and atomic.

In the current post, I just want to understand how SADI invokes the services.

The classical Web Services are described using WSDL (Web Services Description Language) and the messages are transported with SOAP.

In Sadi, as far as I understand it, the description of the services, the operations, the inputs, the ouputs and the transport of the messages use a RDF/OWL format.

http://sadiframework.org/registry/ contains all the services handled by the SADI framework. For example calling http://sadiframework.org/services/getPubMedReferencesForPDB returns the following XML document describing the service:

<rdf:RDF
xmlns="http://www.w3.org/2002/07/owl#"
 xmlns:a="http://www.mygrid.org.uk/mygrid-moby-service#"
 xmlns:b="http://protege.stanford.edu/plugins/owl/dc/protege-dc.owl#"
 xml:base="http://bioinfo.icapture.ubc.ca/SADI"
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 xmlns:databases="http://sadiframework.org/ontologies/Databases.owl#"
 xmlns:misc="http://sadiframework.org/ontologies/miscellaneousObjects.owl#"
 xmlns:owl="http://www.w3.org/2002/07/owl#"
 xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://sadiframework.org/services/getPubMedReferencesForPDB">
    <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/>
    <b:format>sadi</b:format>
    <b:identifier>urn:lsid:myservices:getPubMedReferencesForPDB</b:identifier>
    <a:locationURI>http://sadiframework.org/services/getPubMedReferencesForPDB</a:locationURI>
    <a:hasServiceDescriptionText>A implementation of the 'getPubMedReferencesForPDB' service</a:hasServiceDescriptionText>
    <a:hasServiceDescriptionLocation>http://sadiframework.org/services/getPubMedReferencesForPDB</a:hasServiceDescriptionLocation>
    <a:hasServiceNameText>getPubMedReferencesForPDB</a:hasServiceNameText>
    <a:providedBy>
        <rdf:Description rdf:about="getPubMedReferencesForPDB_mark.ubic.ca_0">
            <a:authoritative>0</a:authoritative>
            <b:creator>markw@illuminae.com</b:creator>
            <b:publisher>mark.ubic.ca</b:publisher>
            <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#organisation"/>
        </rdf:Description>
    </a:providedBy>
    <a:hasOperation>
        <rdf:Description rdf:about="getPubMedReferencesForPDB_mark.ubic.ca_1">
            <a:hasOperationNameText>getPubMedReferencesForPDB</a:hasOperationNameText>
            <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/>
            <a:performsTask>
                <rdf:Description rdf:about="getPubMedReferencesForPDB_mark.ubic.ca_2">
                    <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operationTask"/>
                    <rdf:type rdf:resource="http://mygrid.org.uk/sometype"/>
                </rdf:Description>
            </a:performsTask>
            <a:inputParameter>
                <rdf:Description rdf:about="getPubMedReferencesForPDB_mark.ubic.ca_3">
                    <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/>
                    <a:objectType>
                        <rdf:Description rdf:about="http://purl.oclc.org/SADI/LSRN/PDB_Thing"/>
                    </a:objectType>
                </rdf:Description>
            </a:inputParameter>
            <a:outputParameter>
                <rdf:Description rdf:about="getPubMedReferencesForPDB_mark.ubic.ca_4">
                    <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/>
                    <a:objectType>
                        <rdf:Description rdf:about="http://sadiframework.org/ontologies/pdb_2_pmid.owl#getPubMedReferencesForPDB_Output"/>
                    </a:objectType>
                </rdf:Description>
            </a:outputParameter>
        </rdf:Description>
    </a:hasOperation>
  </rdf:Description>
</rdf:RDF>

A part of this service can be visualized as a graph:

So, the service getPubMedReferencesForPDB takes as input a parameter of type http://purl.oclc.org/SADI/LSRN/PDB_Thing and returns an object of type http://sadiframework.org/ontologies/pdb_2_pmid.owl#getPubMedReferencesForPDB_Output. Ok, let's try this service: I'm going to invoke the service getPubMedReferencesForPDB with two structures of type http://purl.oclc.org/SADI/LSRN/PDB_Thing (Many thanks to Mark for helping me via Twitter ...:-P ): 1KNZ and 1LJ2.

File: input.rdf

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:lsrn="http://purl.oclc.org/SADI/LSRN/">
        <lsrn:PDB_Thing rdf:about="http://lsrn.org/PDB:1KNZ"/>
        <lsrn:PDB_Thing rdf:about="http://lsrn.org/PDB:1LJ2"/>
</rdf:RDF>

Calling curl with this file for the service getPubMedReferencesForPDB:

curl -d @input.rdf http://sadiframework.org/services/getPubMedReferencesForPDB

Result:

<rdf:RDF>
  <rdf:Description rdf:about="http://lsrn.org/PDB:1LJ2">
    <a:hasReference>
      <rdf:Description rdf:about="http://lsrn.org/PMID:12086624">
        <rdf:type rdf:resource="http://purl.oclc.org/SADI/LSRN/PMID_Thing"/>
      </rdf:Description>
    </a:hasReference>
    <rdf:type rdf:resource="http://sadiframework.org/ontologies/pdb_2_pmid.owl#getPubMedReferencesForPDB_Output"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://lsrn.org/PDB:1KNZ">
    <a:hasReference>
      <rdf:Description rdf:about="http://lsrn.org/PMID:11792322">
        <rdf:type rdf:resource="http://purl.oclc.org/SADI/LSRN/PMID_Thing"/>
      </rdf:Description>
    </a:hasReference>
    <rdf:type rdf:resource="http://sadiframework.org/ontologies/pdb_2_pmid.owl#getPubMedReferencesForPDB_Output"/>
  </rdf:Description>
</rdf:RDF>

Here, for each PDB entry (input), we were returned a Pubmed Identifier (output).

That's it !
Pierre

Another Tiny tool : RDF-to-Dot

RDFToDot Transforms an XML/RDF input to Graphviz-dot ( http://www.graphviz.org )

Usage

-h help; This screen.
-p {prefix} {uri} add this prefix mapping
(rdf stdin | rdf files | rdf urls )

Example

The following example my linkedin profile as a graph using a HTML Canvas:

xsltproc --html linkedin2foaf.xsl http://www.linkedin.com/in/lindenbaum |\
java -jar rdf2dot.jar |\
dot -Tsvg |\
java -jar svg2canvas.jar > file.html

Result:

Download

http://code.google.com/p/lindenb/downloads/list

That's it
Pierre

12 February 2010

Processing large XML documents with XSLT

I've resurrected an old java program called xsltstream which might be useful for biohackathon2010. This program applies a XSLT stylesheet only for the given node from a large xml document. The DOM is read from a SAX stream, built in memory for each target element , processed with XSLT and then disposed. Now, say you want to transform a XML file from dbSNP with XSLT to make a RDF document. You cannot do that with xsltproc because the XML file is just too big ( e.g. ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/ds_ch1.xml.gz is 1,099,375 KB ).
But an xslt stylesheet can be applied with xsltstream to all the <Rs> elements of 'ds_ch1.xml.gz':

java -jar xsltstream.jar -x 'http://lindenb.googlecode.com/svn/trunk/src/xsl/dbsnp2rdf.xsl' -q Rs \
'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/ds_ch1.xml.gz' |\
grep -v "rdf:RDF" | grep -v "<?xml version"

(...)
<o:SNP rdf:about="http://www.ncbi.nlm.nih.gov/snp/830">
<dc:title>rs830</dc:title>
<o:taxon rdf:resource="http://www.ncbi.nlm.nih.gov/taxonomy/9606"/>
<o:het rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.02</o:het>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:WIAF"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:SNP500CANCER"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:SEQUENOM"/>
<o:hasMapping>
<o:Mapping>
<o:build rdf:resource="urn:void:ncbi:build:Celera/36_3"/>
<o:chrom rdf:resource="urn:void:ncbi:chromosome:9606/chr1"/>
<o:start rdf:datatype="http://www.w3.org/2001/XMLSchema#int">66444409</o:start>
<o:end rdf:datatype="http://www.w3.org/2001/XMLSchema#int">66444410</o:end>
<o:orient>+</o:orient>
</o:Mapping>
</o:hasMapping>
<o:hasMapping>
<o:Mapping>
<o:build rdf:resource="urn:void:ncbi:build:HuRef/36_3"/>
<o:chrom rdf:resource="urn:void:ncbi:chromosome:9606/chr1"/>
<o:start rdf:datatype="http://www.w3.org/2001/XMLSchema#int">66263806</o:start>
<o:end rdf:datatype="http://www.w3.org/2001/XMLSchema#int">66263807</o:end>
<o:orient>-</o:orient>
</o:Mapping>
</o:hasMapping>
<o:hasMapping>
<o:Mapping>
<o:build rdf:resource="urn:void:ncbi:build:reference/36_3"/>
<o:chrom rdf:resource="urn:void:ncbi:chromosome:9606/chr1"/>
<o:start rdf:datatype="http://www.w3.org/2001/XMLSchema#int">67926134</o:start>
<o:end rdf:datatype="http://www.w3.org/2001/XMLSchema#int">67926135</o:end>
<o:orient>+</o:orient>
</o:Mapping>
</o:hasMapping>
</o:SNP>

<o:SNP rdf:about="http://www.ncbi.nlm.nih.gov/snp/844">
<dc:title>rs844</dc:title>
<o:taxon rdf:resource="http://www.ncbi.nlm.nih.gov/taxonomy/9606"/>
<o:het rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.42</o:het>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:WIAF"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:LEE"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:HGBASE"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:SC_JCM"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:TSC-CSHL"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:LEE"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:YUSUKE"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:CGAP-GAI"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:CSHL-HAPMAP"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:PERLEGEN"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:ABI"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:SI_EXO"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:BCMHGSC_JDW"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:HUMANGENOME_JCVI"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:SNP500CANCER"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:1000GENOMES"/>
<o:hasHandle rdf:resource="urn:void:ncbi:snp:handle:ILLUMINA-UK"/>
<o:hasMapping>
<o:Mapping>
<o:build rdf:resource="urn:void:ncbi:build:Celera/36_3"/>
<o:chrom rdf:resource="urn:void:ncbi:chromosome:9606/chr1"/>
<o:start rdf:datatype="http://www.w3.org/2001/XMLSchema#int">134750981</o:start>
<o:end rdf:datatype="http://www.w3.org/2001/XMLSchema#int">134750982</o:end>
<o:orient>+</o:orient>
</o:Mapping>
</o:hasMapping>
<o:hasMapping>
<o:Mapping>
<o:build rdf:resource="urn:void:ncbi:build:HuRef/36_3"/>
<o:chrom rdf:resource="urn:void:ncbi:chromosome:9606/chr1"/>
<o:start rdf:datatype="http://www.w3.org/2001/XMLSchema#int">132892081</o:start>
<o:end rdf:datatype="http://www.w3.org/2001/XMLSchema#int">132892082</o:end>
<o:orient>-</o:orient>
(...)

The java archive for xsltstream is available at http://lindenb.googlecode.com/files/xsltstream.jar

Usage:

  -x <xslt-stylesheet file/url> required
  -p <param-name> <param-value> (add parameter to the xslt engine)
  -d depth (0 based) default:-1
  -q qName target default:null
   <file>|stdin

That's it !
Pierre

11 February 2010

Freebase and Biohackathon 2010

This post is a quick overview of Freebase for the participants of biohackathon2010 who attended François Belleau's lecture this morning. It was suggested here that Freebase could be used to store and share some predicates for the semantic web. In the current post, I'm going to use CURL to programmatically add a new Namespace in freebase .

OK. First, let's send a MQL query to Freebase and get the schema for the type 'Namespace'. We ask for the name, the id and the expected_type of all the properties of /user/biohackathon/default_domain/namespace_1:

curl http://www.freebase.com/api/service/mqlread -d 'query={
"query1":
{
"query":
[{
"id":"/user/biohackathon/default_domain/namespace_1",
"/type/type/properties": [{
"name": null,
"id":null,
"expected_type": null
}]
}]
}
}'

The response is:

{
"code": "/api/status/ok",
"result": [
{
"/type/type/properties": [
{
"expected_type": "/type/uri",
"id": "/user/biohackathon/default_domain/namespace_1/uri",
"name": "URI"
},
{
"expected_type": "/type/uri",
"id": "/user/biohackathon/default_domain/namespace_1/url",
"name": "Documentation URL"
}
],
"id": "/user/biohackathon/default_domain/namespace_1"
}
],
"status": "200 OK",
"transaction_id": "cache;cache01.p01.sjc1:8101;2010-02-11T17:46:20Z;0024"
}

Ok, there are two properties for this 'Namespace': URI (a '/type/uri') and Documentation URL (a '/type/uri' too) Now we're going to insert a new Namespace. The namespace will be DOAP (Description of a Project). The URI for DOAP is http://usefulinc.com/ns/doap# and the Documentation URL is http://trac.usefulinc.com/doap.

Authenticate (only once)

curl -c cookies.txt -d "username=yourusername" -d "password=yourpassword" https://www.freebase.com/api/account/login
{
"code": "/api/status/ok",
"messages": [
{
"code": "/api/status/ok/account/login",
"message": "Login succeeded",
"username": "yourusername"
}
],
"status": "200 OK",
"transaction_id": "cache;xxxxx"
}

Insert the DOAP Namespace:

curl -b cookies.txt -H 'X-Requested-With: curl' https://www.freebase.com/api/service/mqlwrite -d 'query={"query":{
"create": "unless_exists",
"type":"/user/biohackathon/default_domain/namespace_1",
"id":null,
"name":"doap",
"/user/biohackathon/default_domain/namespace_1/uri":"http://usefulinc.com/ns/doap#",
"/user/biohackathon/default_domain/namespace_1/url":"http://trac.usefulinc.com/doap"
}}'

The anwser from freebase is:

{
"code": "/api/status/ok",
"result": {
"/user/biohackathon/default_domain/namespace_1/uri": "http://usefulinc.com/ns/doap#",
"/user/biohackathon/default_domain/namespace_1/url": "http://trac.usefulinc.com/doap",
"create": "created",
"id": "/guid/9202a8c04000641f8000000013e40eea",
"name": "doap",
"type": "/user/biohackathon/default_domain/namespace_1"
},
"status": "200 OK",
"transaction_id": "cache;cache02.p01.sjc1:8101;2010-02-11T18:47:45Z;0037"
}

You can now view this new Namespace at http://www.freebase.com/view/user/biohackathon/default_domain/views/namespace_1.

That's it !
Pierre

Mapping RDBMS to RDF with D2RQ (yet another geeky title)

One of the coolest thing have seen here at Biohackathon 2010 is D2RQ (thank you Jan !):
D2RQ is a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies. The D2RQ Platform uses these mapping to enables applications to access a RDF-view on a non-RDF database.
In this post, I'll describe how I've installed a D2RQ server.
First, Download D2RQ:

wget http://downloads.sourceforge.net/project/d2rq-map/D2R%20Server/v0.7%20%28alpha%29/d2r-server-0.7.tar.gz
tar xfz d2r-server-0.7.tar.gz

Check if the java mysql driver is presnet in the lib folder (yes, it is)

ls lib/mysql-connector-java-5.1.7-bin.jar

Create one table in mysql describing some SNPs:

mysql> create table snp(id int unsigned primary key, name varchar(20) not null,avHet float);
Query OK, 0 rows affected (0.04 sec)

insert into snp(id,name,avHet) values (3210717,"rs3210717",0.2408);
insert into snp(id,name,avHet) values (1045871,"rs1045871",0.4278);
insert into snp(id,name,avHet) values (1045862,"rs1045862",0.2688);
insert into snp(id,name,avHet) values (17149433,"rs17149433",0.1958);
insert into snp(id,name,avHet) values (17149429,"rs17149429",0.1128);
insert into snp(id,name,avHet) values (16925319,"rs16925319",0.2822);
insert into snp(id,name,avHet) values (17353727,"rs17353727",0.495);
insert into snp(id,name,avHet) values (17157186,"rs17157186",0.4118);
insert into snp(id,name,avHet) values (3210688,"rs3210688",0.1638);
insert into snp(id,name,avHet) values (17157183,"rs17157183",0.4422);

Now call generate-mapping to generate the mapping between MYSQL and RDF:

./generate-mapping -u root -d com.mysql.jdbc.Driver -o mapping.n3 -b "my:bio:database" "jdbc:mysql://localhost/test"

Here is the file mapping.n3 that was generated

@prefix map: <file:/home/pierre/tmp/D2RQ/d2r-server-0.7/mapping.n3#> .
@prefix db: <> .
@prefix vocab: <my:bio:databasevocab/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
@prefix jdbc: <http://d2rq.org/terms/jdbc/> .

map:database a d2rq:Database;
d2rq:jdbcDriver "com.mysql.jdbc.Driver";
d2rq:jdbcDSN "jdbc:mysql://localhost/test";
d2rq:username "root";
jdbc:autoReconnect "true";
jdbc:zeroDateTimeBehavior "convertToNull";
.

# Table snp
map:snp a d2rq:ClassMap;
d2rq:dataStorage map:database;
d2rq:uriPattern "snp/@@snp.id@@";
d2rq:class vocab:snp;
d2rq:classDefinitionLabel "snp";
.
map:snp__label a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:snp;
d2rq:property rdfs:label;
d2rq:pattern "snp #@@snp.id@@";
.
map:snp_id a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:snp;
d2rq:property vocab:snp_id;
d2rq:propertyDefinitionLabel "snp id";
d2rq:column "snp.id";
d2rq:datatype xsd:int;
.
map:snp_name a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:snp;
d2rq:property vocab:snp_name;
d2rq:propertyDefinitionLabel "snp name";
d2rq:column "snp.name";
.
map:snp_avHet a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:snp;
d2rq:property vocab:snp_avHet;
d2rq:propertyDefinitionLabel "snp avHet";
d2rq:column "snp.avHet";
d2rq:datatype xsd:float;
.

...now, start the d2r-server:

./d2r-server -p 8080 mapping.n3
03:15:03 INFO log :: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
03:15:03 INFO log :: jetty-6.1.10
03:15:03 INFO log :: NO JSP Support for , did not find org.apache.jasper.servlet.JspServlet
03:15:03 INFO D2RServer :: using config file: file:/home/pierre/tmp/D2RQ/d2r-server-0.7/ucsc.n3
(...)

Open your web browser at http://localhost:8080/snorql/ and TADA!!!!!!!! Here is a functional SPARQL engine mapping your database.

FTW !

That's it !
Pierre

YOKOFAKUN

23 February 2010

A Mysql user defined function (UDF) for Gene Ontology (GO)

Building the database

Initializing the mysql UDF

Disposing the mysql UDF

invoking the UDF

Compiling

Creating the function

Invoking the UDF function

Disposing the UDF function

18 February 2010

eXist: The Open Source Native XML Database : My notebook

Download & Install

Starting the Server

Inserting the data

Using XQUERY

17 February 2010

The path from EgonWillighagen to Jandot : Neo4j , a graph API for java: my notebook.

Result:

Source code

15 February 2010

Semantic Web Services with the SADI Framework: my notebook.

Another Tiny tool : RDF-to-Dot

Usage

Example

Download

12 February 2010

Processing large XML documents with XSLT

11 February 2010

Freebase and Biohackathon 2010

Mapping RDBMS to RDF with D2RQ (yet another geeky title)

About Me

Feeds

Blog Archive

Web2.0

Labels