11 August 2006

The Life Sciences Semantic Web is Full of Creeps!

An article published in " Briefings in Bioinformatics Advance Access".

Benjamin M. Good and Mark D. Wilkinson
Abstract:The Semantic Web for the Life Sciences (SWLS), when realized, will dramatically improve our ability to conduct bioinformatics analyses using the vast and growing stores of web-accessible resources. This ability will be achieved through the widespread acceptance and application of standards for naming, representing, describing and accessing biological information. The W3C-led Semantic Web initiative has established most, if not all, of the standards and technologies needed to achieve a unified, global SWLS. Unfortunately, the bioinformatics community has, thus far, appeared reluctant to fully adopt them. Rather, we are seeing what could be described as ‘semantic creep’--timid, piecemeal and ad hoc adoption of parts of standards by groups that should be stridently taking a leadership role for the community. We suggest that, at this point, the primary hindrances to the creation of the SWLS may be social rather than technological in nature, and that, like the original Web, the establishment of the SWLS will depend primarily on the will and participation of its consumers.

Mark Wilkinson is one of the creators of BioMoby. Bio Moby is a system for interoperability between biological data hosts and analytical services. Benjamin Good is a PhD student in the British Columbia Strategic Training Program in Bioinformatics. Both of them have a profile on connotea (users bgood, mwilkinson), group:Wilkinson Laboratory).

Although I'm convinced that the semantic web/RDF/XML model is the format of choice for any application (please ! use it for your output format !), I admit I never had the time and the technical knowledge about web services to really understand how BioMoby works , why I should use it and why I should use a LSID instead of an good old URI... :-)

Anonymous said...

Hi Pierre,

Well, what did you think ??

Lets hear it!

-Ben (Creep 1)

Pierre said...

I've just read your nice paper which contains a very good analysis on how people use or avoid to use SWLS.
About LSID, let's be honest, although I see the interest of RDF, I (think I) don't need the LSID in my job mostly because the people from my lab talk about "their" genes, they ask questions about some verbose identifiers (e.g. "interleukin receptor 3"), and they get some non-URI identifiers from a high number of non-SW databases (858 database in NAR 2006 !!! http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D3 ).

All those technologies requires a high level of knowlege in informatics (xml, semantic-web, java, OWL,web-services,...) whereas many people I know are "just" biologists with some good skills in C , perl and a basic knowledge of JavaSE. It's far more easer and straightforward for them to use 'awk' on a tab delimited file rather than parsing an ugly RDF file...

A good way to promote LSID & RDF in the community would be to convince the NCBI to use it (RDF version of genbank, genbank identfiers with LSID, FOAF-ing the authors, etc...)