31 July 2007

X:Map, a Genome Browser

Tim Yates is one of the latest member who joined the bioinformatics group on 'Nature Network'. Dr Yates works as a Research Programmer at the Paterson Institute for Cancer Research. On his web page is introduced X:MAP: an interactive, real-time scrollable, genome browser that shows the location of individual exon probes with respect to their target genes, transcripts and exons.

X:Map is a genome browser (http://xmap.picr.man.ac.uk/) which uses the google map API and the data from Ensembl. The result is really neat.

see also: AJAXification of genome browsers on NN.

19 July 2007

Scifoo 07: anxiety from a homebody

I'm arriving at SFO 22H00 on the 2nd and to SFO on the 6th 10H15.

See you there ! :-)


Seven deadly sins of bioinformatics

Via NodalPoint.
Keynote talk from Carole Goble at BOSC SIG from ISMB 2007 in Vienna, July 2007.

17 July 2007

Inside LSID

This post is my notes about LSID but it has nothing todo with the current "LSID wars".

There is no or no good documentation about the life science identifiers(LSID). Did you just try to read the specs ? houch... I'm a biologist not a network engineer. Fortunately the sources of the firefox add-on for LSID where very informative. It shows what happen when you enter a LSID in the browser. (Note: Roderic Page has also implemented is own firefox extension for LSID, see http://lsid.mozdev.org/))

Say, you have a LSID identifier:

The third part of this uri (ubio.org) is called the authority. By default the plugin looks at http://ubio.org:9090/authority to find a "WSDL" file.

OK, and this is where I've got a problem: the default behavior of the firefox add-on failed in most cases I've tested.
For instance, dcc.hapmap.org:9090/authority does not work with urn:LSID:dcc.hapmap.org:Individual:JA18942:1. So I guess that there must be a way to find this authority from the LSID itself (biomoby?) but I still have not find how.

Here I added the server http://www.ubio.org/authority in the preferences of the add-ons just because I found the URL by chance

Here is the WSDL file found at http://www.ubio.org/authority:

<?xml version="1.0"?>
<wsdl:definitions xmlns:tns="http://www.hyam.net/lsid/Authority"

<import namespace="http://www.omg.org/LSID/2003/AuthorityServiceHTTPBindings" location="LSIDAuthorityServiceHTTPBindings.wsdl" />

<wsdl:service name="MyAuthorityHTTPService">
<wsdl:port name="MyAuthorityHTTPPort" binding="httpsns:LSIDAuthorityHTTPBinding">
<httpsns:address location="http://www.ubio.org/authority/index.php" />


Tthe prefix associated with the namespace http://www.omg.org/LSID/2003/AuthorityServiceHTTPBindings is then searched (There may be other bindings than HTTP: SOAP, FTP...). Here it is httpsns. The <wsdl:port> element having an attribute binding containing httpsns:LSIDAuthorityHTTPBinding" contains a child element with an attribute "location". Here the value of "location" is "http://www.ubio.org/authority/index.php". We then ask some informations about our LSID from this URL by adding a parameter "lsid=[the-lsid]" at the end of the URL: http://www.ubio.org/authority/index.php?lsid=urn:lsid:ubio.org:namebank:11815. The result is, again a WSDL file:

<?xml version="1.0"?>
<definitions xmlns:tns="http://www.example.org/SampleDataServices"

<import namespace="http://www.omg.org/LSID/2003/DataServiceHTTPBindings" location="LSIDDataServiceHTTPBindings.wsdl" />

<!-- Example HTTP GET Services (urlEncoding) -->
<service name="MyDataHTTPService">
<port name="MyDataServiceHTTPPort" binding="httpsns:LSIDDataHTTPBinding">
<http:address location="http://www.ubio.org/authority/data.php" />
<service name="MyMetadataHTTPService">
<port name="MyMetadataServiceHTTPPort" binding="httpsns:LSIDMetadataHTTPBinding">
<http:address location="http://www.ubio.org/authority/metadata.php" />

Again, from this xml file, we obtain two URLs: http://www.ubio.org/authority/data.php and http://www.ubio.org/authority/metadata.php are the URL respectively used to fetch the data and the metadata about the LSID.

http://www.ubio.org/authority/data.php?lsid=urn:lsid:ubio.org:namebank:11815 returns
http://www.ubio.org/authority/data.php?lsid=urn:lsid:ubio.org:namebank:11815 (???)

Here is the Metadata/RDF file fetched from http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815
<?xml version="1.0" encoding="utf-8"?>

<rdf:Description rdf:about="urn:lsid:ubio.org:namebank:11815">
<dc:creator rdf:resource="http://www.ubio.org"/>
<dc:subject>Pternistis leucoscepus (Gray, GR) 1867</dc:subject>
<ubio:canonicalName>Pternistis leucoscepus</ubio:canonicalName>
<dc:title>Pternistis leucoscepus</dc:title>
<dc:type>Scientific Name</dc:type>
<ubio:lexicalStatus>Unknown (Default)</ubio:lexicalStatus>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:954940"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:954941"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:1564236"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:783787"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:1580313"/>
<gla:mapping rdf:resource="http://starcentral.mbl.edu/microscope/portal.php?pagetitle=classification&amp;BLCHID=12-4498"/>
<gla:mapping rdf:resource="http://www.cbif.gc.ca/pls/itisca/next?v_tsn=553857&amp;taxa=&p_format=&p_ifx=cbif&p_lang="/>
<gla:hasBasionym rdf:resource="urn:lsid:ubio.org:namebank:12292"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:12292"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:1762007"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:1762032"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:1762051"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:3408791"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1116259"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1137821"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1173817"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1174615"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1416177"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1672192"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:2233032"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:12798879"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1909656"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:2304281"/>
<dcterms:bibliographicCitation>Sclater, W.L., Systema Avium Æthiopicarum, p. 91</dcterms:bibliographicCitation>

13 July 2007

NAR, Web Server issue July 2007

The annual "Web Server Issue" of "Nucleic Acids Research" is available at :http://nar.oxfordjournals.org/content/vol35/suppl_2/index.dtl?etoc. This issue reports on 130 web servers.


URL +1, LSID -1

"URL +1, LSID -1" is the name of the current thread on "public-semweb-lifesci":
This discussion (worth looking) is about the life science identifier 'LSID) and it was started by Eric Jain:

In the latest release of UniProt (11.3), all URIs of the form:


have been replaced with URLs:


In general, these URLs can be resolved to a human readable web page (a few are still broken, will be fixed). Some of these web pages may (or may not) be linked to a machine-readable representation via link-rel=alternate.

As an optimization for "Semantic Web" crawlers, there is experimental support for "Accept" headers (i.e. set it to "application/rdf+xml").

Some examples:


Among the protagonists we can find Roderic Page, Michel Dumontier, Mark Wilkinson, Alan Ruttenberg, Dany Ayers, etc...

Life Science Identifiers (LSIDs) are persistent, location-independent, resource identifiers for uniquely naming biologically significant resources including species names, concepts, occurrences, genes or proteins, or data objects that encode information about them. To put it simply, LSIDs are a way to identify and locate pieces of biological information on the web.

As far I understand LSID, we all should use lsid:ncbi.nlm.nih.gov:pubmed:12507336 instead of http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=12507336&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum or http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&uid=12507336. (Note that the two later URL are not the same but they point to the same article). An LSID resolver can also be used to find/discover some other (RDF based) properties about your object.

In the thread a firefox extension resolving lSID uri was described: I just installed it on my firefox and it looks nice and the code looks really interesting: it shows how to create a firefox extension which will insert a new handler for a new internet protocol named "lsidres:".

LsidModule.registerSelf = function (compMgr, location, loaderStr, type){

// http://developer.mozilla.org/xpcom/api/nsIComponentRegistrar/
compMgr = compMgr.QueryInterface(Components.interfaces.nsIComponentRegistrar);
"Protocol handler for LSID",
location, loaderStr, type);


Then when a hyperlink in a HTML page (such as lsidres:urn:lsid:ubio.org:namebank:11815) is activated, firefox open a new window, calls a remote LSID resolver and displays the properties of your object.


04 July 2007

Systems-Biology using GoogleGears: my notebook

Google gears is an open source browser extension that enables web applications to provide offline functionality. The data are stored locally in a fully-searchable relational database using the sqlite engine.

My Biological Network is a tool I created as a test to play with Google gears: it is used to build a network of protein-protein interactions. It uses Google Gears to record your entries on the local disk, so Gears needs to be installed on your computer. Programming with gears with JAVASCRIPT is really cool as you don't have to implement the storage of the data on the server side and you're using some standard SQL statements to handle the data.


My Biological Network


Open the tab Organism (fig. 4): add one or more organism. (Homo Sapiens already inserted by default)

Open the tab Protein (fig. 1): add one or more protein.

Open the tab Paper (fig. 3): add one or more article that will be used as an evidence for an interaction.

Open the tab Technology (fig. 2): add one or more technology that was used to characterize an interaction.

Open the tab Component: add one or more cellular component using Gene Ontology (GO:0005575 \"cellular component\" was inserted by default)

Open the tab Interaction (fig. 5):

  • Name and describe this interaction

  • Select one or more protein and/or one or more previously defined proteic complex. You Cannot describe self interactions with this tool.

  • (optional) choose one or more paper/technology/component...

Open the RDF table (fig. 6): I choose to display the content of the database using RDF. Such format can then be validated and visualized using the W3C RDF validator, or transformed using XSLT, etc.... I also used the life science identifier (LSID) as an URI for my resources.

On my computer, the database is stored in /env/islande/home/lindenb/.mozilla/firefox/<profile-id>/Google Gears for Firefox/islande/<host>/mynetwork#database. The database can be manualy accessed using sqlite3:

sqlite3 mynetwork#database
SQLite version 3.4.0
Enter '.help' for instructions
sqlite> .tables
component interactionhash paper technology
interaction organism prote
sqlite> .schema organism
CREATE TABLE organism(id integer primary key ,name varchar(50) not null unique);
sqlite> select * from organism;
9606|Homo Sapiens


We the page is loaded, we check that gears was installed

if (!window.google || !google.gears) {
debug("NOTE: You must install Google Gears first.")

We then create the database if does not exist. The file is created in firefox in ${HOME}/.mozilla/firefox/<profile-id>/Google Gears for Firefox/<server>/mynetwork#database

connection = google.gears.factory.create("beta.database","1.0");

I create the tables just by invoking some standards SQL 'CREATE TABLE' statements. I also insert some default values (e.g. human organism)

connection.execute("create table if not exists organism(id integer primary key ,name varchar(50) not null unique)");
connection.execute("insert or ignore into organism(id,name) values(9606,\"Homo Sapiens\")");
connection.execute("create table if not exists protein(id integerprimary key autoincrement,name varchar(50) not null,taxId int not null,acn varchar(50) not null unique)");
connection.execute("create table if not exists paper(pmid integerprimary key ,title varchar(255) not null,citation varchar(255) not null,firstAuthor varchar(50) not null)");
connection.execute("create table if not exists component(id integer primary key autoincrement,go varchar(50) not null unique, name varchar(50) not null unique)");

connection.execute("insert or ignore into component(go,name) values(\"GO:0005575\",\"cellular component\")");
connection.execute("insert or ignore into component(go,name) values(\"GO:0008372\",\"cellular component unknown\")");

connection.execute("create table if not exists technology(id integer primary key autoincrement,name varchar(50) not null unique, description varchar(255) not null)");

connection.execute("insert or ignore into technology(name,description) values(\"Y2H\",\"Yeast Two Hybrid System\")");
connection.execute("insert or ignore into technology(name,description) values(\"CoIP\",\"Co-Immuno Precipitation\")");

connection.execute("create table if not exists interaction(id integer primary key autoincrement, name varchar(50) not null unique,description varchar(255) not null)");
connection.execute("create table if not exists interactionhash(id integer primary key autoincrement,LINK_interaction int ,type varchar(20) not null,child int not null)");

When a data is about to be inserted we check all the fields and we insert them using SQL: INSERT INTO

var id= getById("organism-input-id");
debug("TaxId not a Number");
var name=getById("organism-input-name");
debug("Taxon Name empty");

connection.execute("insert into organism(id,name) values("+sqlescape(trim(id.value))+","+sqlquote(trim(name.value))+")");

a simple SELECT is used to retrieve the data and insert them in a HTML table

var rs= connection.execute("select id,name from organism order by name");
while (rs.isValidRow())
var tr= ce("tr");
var td= ce("td");

td= ce("td");
var a= ce("a");
a.setAttribute("title","Open in NCBI");

That's it !


updated 2010-08-12: source code

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<script type="text/javascript" src="gears_init.js"></script>
<script type="text/javascript" src="network.js"></script>
<link rel="stylesheet" type="text/css" href="./network.css" />
<title>My Biological Network</title>
<body onload="init()">
<h1>My Biological Network</h1>
<p>Pierre Lindenbaum PhD <a href="mailto:plindenbaum@yahoo.fr">plindenbaum@yahoo.fr</a><br/><a href="http://plindenbaum.blogspot.com">http://plindenbaum.blogspot.com</a><br/><address>Bioinformatics department<br/><a href="http://www.integragen.com">Integragen S.A.</a><br/>Evry, France</address></p>
<button onclick="javascript:showCard('home-pane');">Home</button>
<button onclick="showOrganismPane()">Organisms</button>
<button onclick="showProteinPane()">Proteins</button>
<button onclick="showPaperPane()">Papers</button>
<button onclick="showTechnologyPane()">Technology</button>
<button onclick="showComponentPane()">Component</button>
<button onclick="showInteractionPane()">Interactions</button>
<button onclick="showRDFPane()">RDF</button>
<div style="color:red;" id="stderr"></div>

<!-- ====================================== ORGANISM ====================================== -->
<div style="display:none;" id="organism-pane">
<caption>Add an Organism</caption>
<tr><th>NCBI Taxon ID <i>(e.g. 10912)</i></th><td><input id="organism-input-id" length="10"/></td></tr>
<tr><th>NCBI Taxon Name <i>(e.g. Rotavirus)</i></th><td><input id="organism-input-name" length="10"/></td></tr>
<tr><th/><td><button onclick="addOrganism()">Add</button></td></tr>


<table width="80%">
<caption>All Organisms</caption>
<tr><th>Taxon ID</th><th>Taxon Name</th></tr></tr>
<tbody id="organism-table">


<!-- ====================================== COMPONENT ====================================== -->
<div style="display:none;" id="component-pane">
<caption>Add a Component</caption>
<tr><th>Name</th><td><input id="component-input-name" length="10"/></td></tr>
<tr><th>GO</th><td><input id="component-input-go" length="10"/></td></tr>
<tr><th/><td><button onclick="addComponent()">Add</button></td></tr>


<table width="80%">
<caption>All Components</caption>
<tbody id="component-table">


<!-- ====================================== TECHNOLOGY ====================================== -->
<div style="display:none;" id="technology-pane">
<caption>Add a Technology</caption>
<tr><th>Name</th><td><input id="technology-input-name" length="50"/></td></tr>
<tr><th>Description</th><td><input id="technology-input-desc" length="50"/></td></tr>
<tr><th/><td><button onclick="addTechnology()">Add</button></td></tr>


<table width="80%">
<caption>All Technologies</caption>
<tbody id="technology-table">


<!-- ====================================== PROTEIN ====================================== -->

<div style="display:none;" id="protein-pane">
<caption>Add a Protein</caption>
<tr><th>Uniprot accession number <i>(e.g. Q3T8J2)</i></th><td><input id="protein-input-acn" length="10"/></td></tr>
<tr><th>Uniprot Name <i>(e.g. Replicase polyprotein 1ab)</i></th><td><input id="protein-input-name" length="10"/></td></tr>
<tr><th>Organism</th><td><select id="protein-input-taxon" length="10"><option>A</option></select></td></tr>
<tr><th/><td><button onclick="addProtein()">Add</button></td></tr>


<table width="80%">
<caption>All Proteins</caption>
<tr><th>Primary accession</th><th>Name</th><th>Taxon</th></tr></tr>
<tbody id="protein-table">


<!-- ====================================== PAPER ====================================== -->
<div style="display:none;" id="paper-pane">
<caption>Add a Paper</caption>
<tr><th>PMID</th><td><input id="paper-input-pmid" length="10"/></td></tr>
<tr><th>Title</th><td><input id="paper-input-title" length="50"/></td></tr>
<tr><th>Citation</th><td><input id="paper-input-citation" length="50"/></td></tr>
<tr><th>First Author</th><td><input id="paper-input-author" length="50"/></td></tr>
<tr><th/><td><button onclick="addPaper()">Add</button></td></tr>


<table width="80%">
<caption>All Papers</caption>
<tr><th>PMID</th><th>Citation</th><th>First Author</th><th>Title</th></tr></tr>
<tbody id="paper-table">


<!-- ====================================== INTERACTION ====================================== -->

<div style="display:none;" id="interaction-pane">

<caption>Add an Interaction</caption>
<tr><th>Name</th><td colspan="4"><input id="interaction-input-name" length="50"/></td></tr>
<tr><th>Description</th><td colspan="4"><input id="interaction-input-desc" length="50"/></td></tr>
<td><select id="interactors-input-proteins" size="5" multiple="true"/></td>
<td><select id="interactors-input-interactors" size="5" multiple="true"></td>
<td><select id="interactors-input-technologies" size="5" multiple="true"></td>
<td><select id="interactors-input-evidences" size="5" multiple="true"></td>
<td><select id="interactors-input-components" size="5" multiple="true"></td>
<tr><th colspan="4"/><td><button onclick="addInteraction()">Add</button></td></tr>


<table width="80%">
<caption>All Interactions</caption>
<tbody id="interaction-table">


<!-- ====================================== RDF ====================================== -->
<div style="display:none;" id="rdf-pane">
<h2>RDF Pane</h2>
<textarea wrap="off" id="rdf-area" rows="20" cols="80"></textarea>


<!-- ====================================== HOME ====================================== -->
<div style="display:none;" id="home-pane">
<h3>About My Biological Network</h3>
<p><a href="http://gears.google.com/">Google gears</a> is an open source browser extension that enables web applications to provide offline functionality. The data are stored locally in a fully-searchable relational database using the <a href="http://www.sqlite.org/">sqlite engine</a>.</p>
<p><b>My Biological Network</b> is a tool I created as a test to play with Google gears: it is used to build a network of protein-protein interactions. It uses Google Gears to record your entries on the <u>local disk</u>, so Gears needs to be installed on your computer. </p>

Open the tab <b>Organism</b>: add one or more organism. (Homo Sapiens already inserted by default)<br/>
Open the tab <b>Protein</b>: add one or more protein.<br/>
Open the tab <b>Paper</b>: add one or more article that will be used as an evidence for an interaction.<br/>
Open the tab <b>Technology</b>: add one or more technology that was used to characterize an interaction.<br/>
Open the tab <b>Component</b>: add one or more cellular component using Gene Ontology (GO:0005575 \"cellular component\" was inserted by default)<br/>
Open the tab <b>Interaction</b>:<ul>
<li>Name and describe this interaction</li>
<li>Select one or more protein and/or one or more previously defined proteic complex. You <i>Cannot</i> describe self interactions with this tool.<li>
<li>(optional) choose one or more paper/technology/component...</li>
Open the <b>RDF table</b>: I choose to display the content of the database using <a href="http://www.w3.org/RDF/">RDF</a>. Such format can then be validated and visualized using the <a href="http://www.w3.org/RDF/Validator/">W3C RDF validator</a>, or transformed using <a href="http://www.w3.org/TR/xslt">XSLT</a>, etc.... I also used the <a href="http://lsid.sourceforge.net/">life science identifier (LSID)</a> as an URI for my resources.<br/>


<p>On my computer, the database is stored in <code>$HOME/.mozilla/firefox/&lt;profile-id&gt;/Google Gears for Firefox/&lt;host&gt;/mynetwork#database</code>. The database can be manualy accessed using <a href="http://www.sqlite.org/">sqlite3</a>:<pre style='color:black;border:1pt solid;background:lightgray;'>sqlite3 mynetwork#database
SQLite version 3.4.0
Enter &apos;.help&apos; for instructions
sqlite&gt; .tables
component interactionhash paper technology
interaction organism prote
sqlite&gt; .schema organism
CREATE TABLE organism(id integer primary key ,name varchar(50) not null unique);
sqlite&gt; select * from organism;
9606|Homo Sapiens



