Showing posts with label extension. Show all posts
Showing posts with label extension. Show all posts

16 October 2014

IGVFox: Integrative Genomics Viewer control through mozilla Firefox

I've just pushed IGVFox 0.1 an add-on for Firefox, controlling IGV, the Integrative Genomics Viewer.
This add-on allows the users to set the genomic position of IGV by just clicking a hyperlink in a HTML page. The source code is available on github at https://github.com/lindenb/igvfox and a first release is available as a *.xpi file at https://github.com/lindenb/igvfox/releases.


That's it,

Pierre

08 August 2014

A GNU-make plug-in for the #Illumina FASTQs.

The latest version of GNU-Make http://www.gnu.org/software/make/ provides many advanced capabilities, including many useful functions. However, it does not contain a complete programming language and so it has limitations. Sometimes these limitations can be overcome through use of the shell function to invoke a separate program, although this can be inefficient. On systems which support dynamically loadable objects, you can write your own extension in any language (which can be compiled into such an object) and load it to provide extended capabilities ( see http://www.gnu.org/software/make/manual/make.html#Loading-Objects )

Building a plug-in for the Illumina FASTQs.

from http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm

Illumina FASTQ files use the following naming scheme:

<sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz

For example, the following is a valid FASTQ file name:

NA10831_ATCACG_L002_R1_001.fastq.gz

Here I'm writing a set of new functions for makefile to extract the different parts (sample, lane...) of a fastq file-name:

The code is available on github.com at

First a struct holding the parts of the file is created:

enum E_IlluminaComponent
    {
    E_sampleName,
    E_barcodeSequence,
    E_lane,
    E_readNumber,
    E_setNumber
    };

typedef struct illumina_scheme_t
    {
    char* filename;
    char* components[NUM_ILLUMINA_COMPONENTS];
    } IlluminaScheme,*IlluminaSchemePtr ;

and a function parsing the filenames is created:

IlluminaSchemePtr IlluminaSchemeNew(const char* filename)
    {
    ...
    }

when the plugin llumina is loaded as a dynamic C library, the method llumina_gmk_setup is called,
and we tell make about the new functions with gmk_add_function(name,callback,min_args,max_args,no_expand_content) :

int illumina_gmk_setup ()
  {
   gmk_add_function ("illumina_sample",illumina_sample, 1, 1, 0);
   gmk_add_function ("illumina_lane",illumina_lane, 1, 1, 0);
   (...)
  }

A function registered with make must match the gmk_func_ptr type.
It will be invoked with three parameters: name (the name of the function), argc (the number of arguments to the function), and argv (an array of pointers to arguments to the function). The last pointer (that is, argv[argc]) will be null (0).
The return value of the function is the result of expanding the function.

char* illumina_sample(const char *function_name, unsigned int argc, char **argv)
    {
    /** extract the filename(s), build and return a string containing the samples */
    }

Compiling

the plugin must be compiled as a dynamic C library.

Note: The manual says this step can also be generated in the final 'Makefile' (via load ./illumina.so) but I was not able to compile a missing library (illumina.so cannot open shared object file: No such file or directory)

so I compiled it by hand:

gcc -Wall -I/path/to/sources/make-4.0 -shared -fPIC -o illumina.so illumina.c

Test

here is the makefile:

SAMPLES=  NA10831_ATCACG_L002_R1_001.fastq.gz \
      hello \
      NA10832_ATCACG_L002_R1_001.fastq.gz \
      NA10831_ATCACG_L002_R2_001.fastq.gz \
      NA10832_ATCACG_L002_R2_001.fastq.gz \
      NA10833_ATCAGG_L003_R1_003.fastq.gz \
      NA10833_ATCAGG_L003_R1_003.fastq.gz \
      ERROR_ATCAGG_x003_R3_0z3.fastq.gz \
      false

all:
    @echo "SAMPLES: " $(illumina_sample  ${SAMPLES} )
    @echo "BARCODES: " $(illumina_barcode  ${SAMPLES} )
    @echo "LANE: " $(illumina_lane  ${SAMPLES} )
    @echo "READ: " $(illumina_read  ${SAMPLES} )
    @echo "SET: " $(illumina_set  ${SAMPLES} )

output:

$ make
SAMPLES:  NA10831 NA10832 NA10833
BARCODES:  ATCACG ATCAGG
LANE:  L002 L003
READ:  R1 R2
SET:  001 003

That's it,

Pierre

20 December 2012

RDF/Jena: a simple extension for XSLT/XALAN. Testing with NCBI-Gene

In a previous post, I've shown that the XALAN XSLT engine can be extended with custom function returning a DOM Document that will be used by the xslt-stylesheet. Here, I'll create an extension for XALAN getting some RDF statements from a Jena/RDF model. The RDF model will be loaded in memory but one can imagine to use a persistent model ( TDB or SDB). I'll download a record from NCBI-gene, transform it to html and use the disease-ontology database as RDF to annotate it.

A Gene record is downloaded as XML from NCBI gene:

curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=4853&retmode=xml" > notch2.html
The disease ontology is downloaded as RDF/XML:
curl -odoid.owl "http://www.berkeleybop.org/ontologies/doid.owl"

The XSLT Stylesheet

The stylesheet declares the extension jena, loads the RDF model ("$model"), searches for the OMIM identifiers in the Gene record and loads the RDF statements related to that OMIM-ID.
For example the following xpath expression:
jena:query(
   $model,
   $doiid,
   'http://www.geneontology.org/formats/oboInOwl#hasExactSynonym',
   ''
   )
returns a rdf/XML document containing the RDF statements having a subject=$doiid, a property "http://www.geneontology.org/formats/oboInOwl#hasExactSynonym" and any object.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Statement>
    <rdf:subject rdf:resource="http://purl.obolibrary.org/obo/DOID_0050721"/>
    <rdf:predicate rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasExactSynonym"/>
    <rdf:object>Phosphoserine phosphatase deficiency</rdf:object>
  </rdf:Statement>
</rdf:RDF>
The stylesheet:

The Java code

This is the java extension: the constructor loads the RDF model in memory. The function query(..) returns a RDF/XML document matching the query.

Makefile




config.mk:

Result

java -cp ${class.path} org.apache.xalan.xslt.Process \
 -IN notch2.xml \
 -XSL gene2html.xsl -EDUMP -OUT result.html


NOTCH2

Omim ID 610205

Label
Alagille syndrome
Synonym
Arteriohepatic dysplasia (disorder)
Sub-Class Of
Label
gastrointestinal system disease
Synonym
gastrointestinal disease
Sub-Class Of
Label
disease of anatomical entity
Sub-Class Of
Label
disease



Omim ID 102500

Label
Hajdu-Cheney syndrome
Synonym
Hajdu-Cheney syndrome (disorder)
Sub-Class Of
Label
autosomal dominant disease
Sub-Class Of
Label
autosomal genetic disease
Sub-Class Of
Label
monogenic disease
Sub-Class Of
Label
genetic disease
Sub-Class Of
Label
disease









That's it,


Pierre


11 August 2010

Mwncbi, a mediawiki extension loading asynchronously some records from the NCBI

I've just created new extension for mediawiki. This extension creates a new handler for three new tags :<ncbigene/> , <ncbisnp> and <ncbipubmed>.

Each of those tags download asynchronously a XML record from the NCBI (Gene, NCBI Pubmed or dbSNP) using NCBI-EFetch. The XML is then transformed to HTML on the client side using a XSLT transformation and inserted in the mediawiki page. (As the XSLT processor is specific from Firefox I'm afraid this extension won't run for the other browsers ). As I'm using XSLT, the stylesheets are easily modifiable and hence, the HTML rendering is truly customizable.

The source code is available on github: http://github.com/lindenb/mw4bio/

The installation was described on Mediawiki.org: http://www.mediawiki.org/wiki/Extension:Mwncbi.

Screenshots


Gene



Pubmed



DBSNP





That's it

Pierre

06 August 2010

A MediaWiki extension displaying the UCSC Genome Browser

Today I wrote an extension for mediawiki displaying an HTML <iframe/> to the UCSC Genome Browser. This extension will help my colleagues to annotate some candidate genes threw our local wiki.

This extension handles a new tag <ucsciframe> composed of three required parameters: 'chrom', 'start' and 'end'.

For example
<ucsciframe chrom="chr2" start="98987" end="9879899"/>
The source code for this extension is available at:and its documentation is available on www.mediawiki.org.

That's it !

Pierre

10 January 2010

What is the CSS style of that HTML element ?: CSSPopup, an extension for firefox

Trying to find the CSS style of a HTML element is a common task for me and I often look in the <style/> of the pages to try to find what can be "this inspiring CSS". So, I've created CSSPopup, a small extension for firefox. This extension appends a new button in the contextual menu that will print all the CSS selectors of the element that was clicked. For example when I clicked on "Welcome to NCBI" at http://www.ncbi.nlm.nih.gov/, the result was:

h1 {
font-size:32px;
font-weight:bold;
line-height:36px;
margin-bottom:21.4333px;
margin-top:21.4333px;
padding-left:16px;
-moz-column-gap:32px;
}

div {
}

div {
}

div {
}

div {
margin-bottom:0px;
margin-left:0px;
margin-right:0px;
margin-top:0px;
}

body {
margin-bottom:8px;
margin-left:8px;
margin-right:8px;
margin-top:8px;
}

html {
background-attachment:scroll;
background-color:transparent;
background-image:none;
background-position:0% 0%;
background-repeat:repeat;
border-collapse:separate;
border-spacing:0px 0px;
bottom:auto;
caption-side:top;
clear:none;
clip:auto;
color:rgb(0, 0, 0);
content:none;
counter-increment:none;
counter-reset:none;
cursor:auto;
direction:ltr;
display:block;
empty-cells:show;
float:none;
font-family:serif;
font-size:16px;
font-size-adjust:none;
font-style:normal;
font-variant:normal;
font-weight:400;
height:auto;
ime-mode:auto;
left:auto;
letter-spacing:normal;
line-height:19px;
list-style-image:none;
list-style-position:outside;
list-style-type:disc;
margin-bottom:0px;
margin-left:0px;
margin-right:0px;
margin-top:0px;
marker-offset:auto;
max-height:none;
max-width:none;
min-height:0px;
min-width:0px;
opacity:1;
outline-color:rgb(0, 0, 0);
outline-offset:0px;
outline-style:none;
outline-width:0px;
overflow:visible;
overflow-x:visible;
overflow-y:visible;
padding-bottom:0px;
padding-left:0px;
padding-right:0px;
padding-top:0px;
page-break-after:auto;
page-break-before:auto;
pointer-events:visiblepainted;
position:static;
quotes:"“" "”" "‘" "’";
right:auto;
table-layout:auto;
text-align:start;
text-decoration:none;
text-indent:0px;
text-rendering:auto;
text-shadow:none;
text-transform:none;
top:auto;
unicode-bidi:normal;
vertical-align:baseline;
visibility:visible;
white-space:normal;
width:auto;
word-spacing:normal;
word-wrap:normal;
z-index:auto;
-moz-appearance:none;
-moz-background-clip:border;
-moz-background-inline-policy:continuous;
-moz-background-origin:padding;
-moz-binding:none;
-moz-border-bottom-colors:none;
-moz-border-left-colors:none;
-moz-border-right-colors:none;
-moz-border-top-colors:none;
-moz-border-image:none;
-moz-border-radius-bottomleft:0px;
-moz-border-radius-bottomright:0px;
-moz-border-radius-topleft:0px;
-moz-border-radius-topright:0px;
-moz-box-align:stretch;
-moz-box-direction:normal;
-moz-box-flex:0;
-moz-box-ordinal-group:1;
-moz-box-orient:horizontal;
-moz-box-pack:start;
-moz-box-shadow:none;
-moz-box-sizing:content-box;
-moz-column-count:auto;
-moz-column-gap:16px;
-moz-column-width:auto;
-moz-column-rule-width:0px;
-moz-column-rule-style:none;
-moz-column-rule-color:rgb(0, 0, 0);
-moz-float-edge:content-box;
-moz-force-broken-image-icon:0;
-moz-image-region:auto;
-moz-outline-color:rgb(0, 0, 0);
-moz-outline-offset:0px;
-moz-outline-radius-bottomleft:0px;
-moz-outline-radius-bottomright:0px;
-moz-outline-radius-topleft:0px;
-moz-outline-radius-topright:0px;
-moz-outline-style:none;
-moz-outline-width:0px;
-moz-stack-sizing:stretch-to-fit;
-moz-transform:none;
-moz-transform-origin:50% 50%;
-moz-user-focus:none;
-moz-user-input:auto;
-moz-user-modify:read-only;
-moz-user-select:auto;
-moz-appearance:none;
-moz-user-select:auto;
}
The extension can be downloaded at http://code.google.com/p/lindenb/downloads/list and the source code is available at http://code.google.com/p/lindenb/source/browse/trunk/proj/tinyxul/csspopup/.

That's it,
Pierre

04 January 2009

An extension for MediaWiki: displaying a DNA sequence

This post is about a new extension for MediaWiki (the wiki engine of wikipedia written in PHP). This was the first extension I wrote: this extension add a new custom tag <dnaseq> and it simply displays a DNA sequence. Here is a screenshot of this extension installed in my local mediawiki.


and the source code for this extension is available here:


First we tell MediaWiki about this new extension in ${MWROOT}/LocalSettings.php
require_once("$IP/extensions/dnaseq/dnaseq.php");

Then we install this new feature which is a new TAG named <dnaseq>. Each time the mediawiki will find a <dnaseq> , the function myRenderDnaSequence will be called.
$wgHooks['ParserFirstCallInit'][] = 'myDnaSequence';
(...)
function myDnaSequence()
{
global $wgParser;
$wgParser->setHook( 'dnaseq', 'myRenderDnaSequence' );
return true;
}

myRenderDnaSequence is the function returning the formatted DNA sequence:
function myRenderDnaSequence( $input, $args, $parser )
{
if($input==null) return "";
$len= strlen($input);
$n=0;
$html="<div style='padding: 10px; font-size:10px; border-width: thin; border: 1px black solid; white-space: pre;background-color: white;font-family: courier, monospace;line-height:13px; font-size:12px;'>";
for($i=0;$i< $len;$i++)
{
$c = $input[$i];
if(ctype_space($c) || ctype_digit($c)) continue;
if($n % 60 == 0)
{
if($n!=0) $html.="<br/>";
$html.= sprintf("%06d ",($n+1));
}
else if($n % 10 ==0)
{
$html.=" ";
}
$n++;
switch(strtolower($c))
{
case "a":
$html.="<span style='color:green;'>".$c."</span>";
break;
case "c":
$html.="<span style='color:blue;'>".$c."</span>";
break;
case "g":
$html.="<span style='color:black;'>".$c."</span>";
break;
case "t":
case "u":
$html.="<span style='color:red'>".$c."</span>";
break;
default:
$html.="<span style='text-decoration:blink;color:gray'>".$c."</span>";
break;
}
if($n % 60 == 0)
{
$html.= sprintf(" %06d",($n));
}
}
$html .= "</div>";
return $html;
}


That's it.
Pierre

17 March 2008

xul4wikipedia

I've added a few more individuals in my History Of Science and I've also tried to generate an iCal version of this dataset to display the birth/death dates of all those persons (http://lindenb.integragen.org/xulhistory/history.ical) however there is a bug in this file as the events are not correctly displayed in google-calendar. Does anyone knows why ?

There is now a new beautiful version of freebase but it is a little bit slower and as I want to edit a large number of individual, the procedure takes now too much time for me. I received a kind mail of Robert Cook from metaweb about this problem telling me that they're working on this issue. I also noticed that, just like wikipedia, they are now much more concerned about the origin of the pictures. That's fair but I wish I could add a picture to all those individuals :-) . I could draw them but there would be still a problem of rights :-)

Meanwhile, I've added some infoboxes in wikipedia. I also created a simple web form at http://lindenb.integragen.org/xul4wikipedia/xul4wikipedia.cgi to create on the fly a firefox extension. This add-on will append some custom items in the contextual popup menu when editing an article in wikipedia. Each of those items is used to insert a custom text in the textarea of the edited article, for example you won't have to find, copy and paste your favorite Template:Infobox Person, this template will now be always available in your menu. The source code is available here and is broadly inspired from one of my previous post.

Pierre

02 February 2008

Creating a XUL extension for Mozilla/Firefox: my notebook.

(RSS readers, this file is better displayed on my blog)
Here is my notebook on how to create an extension for firefox. The following example was tested with firefox 2.0.0.11. This extension is used to insert a few default templates (such as Template:Infobox_scientist ) when editing a biography on Wikipedia. Infoboxes are used , for example by DBPedia, to create a structured version of wikipedia.

First, create a new profile for firefox, say TEST by invoking firefox with option '-P'

firefox -P

Set up your extension development environment as described here.

I'm now working in the directory ~/XUL:

Create the file ./install.rdf. It's a RDF file describing your extension:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:em="http://www.mozilla.org/2004/em-rdf#">

<rdf:Description about="urn:mozilla:install-manifest">
<!-- my extension ID -->
<em:id>biography-helper@plindenbaum.com</em:id>
<!-- version -->
<em:version>2.0</em:version>
<!-- this is a firefox extension -->
<em:type>2</em:type>

<em:targetApplication>
<rdf:Description>
<!-- this is for firefox -->
<em:id>{ec8030f7-c20a-464f-9b0e-13a3a9e97384}</em:id>
<!-- min/max firefox version -->
<em:minVersion>2.0</em:minVersion>
<em:maxVersion>2.0.0.*</em:maxVersion>
</rdf:Description>
</em:targetApplication>

<!-- name -->
<em:name>Wikipedia Edit Helper!</em:name>
<!-- description -->
<em:description>An Extension for Editing biographies in Wikipedia</em:description>
<!-- author -->
<em:creator>Pierre Lindenbaum</em:creator>
<!-- contact -->
<em:homepageURL>http://plindenbaum.blogspot.com</em:homepageURL>
<!-- icon -->
<em:iconURL>chrome://wiki4biography/skin/darwin32.png</em:iconURL>
</rdf:Description>
</rdf:RDF>


The file ./chrome/content/menu.xul is the XUL interface which will be added to the contextual popup-menu.

<?xml version="1.0" encoding="UTF-8"?>
<overlay id="wiki4biography" xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
<script src="library.js"/>

<popup id="contentAreaContextMenu">
<menuseparator/>
<menu label="Wikipedia" id="menuWikipedia">
<menupopup>

<menuitem label="Infobox Scientist" oncommand="MY.infobox()" />

<menu label="Categories">
<menupopup>
<menuitem label="Astronomers" oncommand="MY.category('Astronomers')"/>
<menuitem label="Biologists" oncommand="MY.category('Biologists')"/>
<menuitem label="Chemists" oncommand="MY.category('Chemists')"/>
<menuitem label="Physicists" oncommand="MY.category('Physicists')"/>
</menupopup>
</menu>

<menu label="Stubs">
<menupopup>
<menuitem label="Astronomer" oncommand="MY.insertTemplate('{{astronomer-stub}}')"/>
<menuitem label="Chemist" oncommand="MY.insertTemplate('{{chemist-stub}}')"/>
<menuitem label="Biologist" oncommand="MY.insertTemplate('{{biologist-stub}}')"/>
<menuitem label="Mathematician" oncommand="MY.insertTemplate('{{mathematician-stub}}')"/>
<menuitem label="Physicist" oncommand="MY.insertTemplate('{{physicist-stub}}')"/>
</menupopup>
</menu>

</menupopup>
</menu>

</popup>
</overlay>



The script used by our menu is ./chrome/content/library.js
var MY={
/** when the xul page is loaded, register for events from the contextual popupmenu */
onload:function()
{
var element = document.getElementById("contentAreaContextMenu");
element.addEventListener("popupshowing",function(evt){MY.preparePopup(evt);},true);
},
/* prepare the contextual menu just before it is showing on screen: hide or show our menu */
preparePopup:function(evt)
{
var element = document.getElementById("menuWikipedia");
if(document.popupNode.id!="wpTextbox1")
{
element.hidden=true;
return;
}
element.hidden=false;
},
/** insert a text at the caret position in the textarea of wikipedia */
insertTemplate:function(text)
{
var area= content.document.getElementById("wpTextbox1");
if(area==null) return;
//alert(area.value.substring(0,20)+" "+area.tagName);
var selstart=area.selectionStart;
var x= area.scrollLeft;
var y= area.scrollTop;
area.value= area.value.substring(0,selstart)+
text+
area.value.substring(area.selectionEnd)
;
area.scrollLeft=x;
area.scrollTop=y;
selstart+=text.length;
area.setSelectionRange(selstart,selstart);
},
/* insert a wikipedia category */
category:function(text)
{
MY.insertTemplate("[[Category:"+text+"]]");
},
/** get current article name */
article:function()
{
var url=""+content.document.location;
var i=url.indexOf("title=",0);
if(i==-1) return "";
i+=6;
var j=url.indexOf("&action",i);
if(j==-1) return "";
return unescape(url.substr(i,j-i).replace("_"," "));
},
/* insert an infobox */
infobox:function()
{
var box="{{Infobox Scientist\n"+
"|name = "+MY.article()+"\n"+
"|box_width =\n"+
"|image = No_free_image_man_%28en%29.svg\n"+ /** sorry, most scientists in wikipedia are men */
"|image_width = 200px\n"+
"|caption = "+MY.article()+"\n"+
"|birth_date = \n"+
"|birth_place = \n"+
"|death_date = \n"+
"|death_place = \n"+
"|residence = \n"+
"|citizenship = \n"+
"|nationality = \n"+
"|ethnicity = \n"+
"|field = \n"+
"|work_institutions = \n"+
"|alma_mater = \n"+
"|doctoral_advisor = \n"+
"|doctoral_students = \n"+
"|known_for = \n"+
"|author_abbrev_bot = \n"+
"|author_abbrev_zoo = \n"+
"|influences = \n"+
"|influenced = \n"+
"|prizes = \n"+
"|footnotes = \n"+
"|signature =\n"+
"}}\n";
MY.insertTemplate(box);
}
};
/* initialize all this stuff */
window.addEventListener("load",MY.onload, false);


The icon ./chrome/skin/darwin32.png is used as an icon for the extension.

The file ./chrome.manifest says what firefox packages and overlays this extension provides.
content wiki4biography chrome/content/
overlay chrome://browser/content/browser.xul chrome://wiki4biography/content/menu.xul
skin wiki4biography classic/1.0 chrome/skin/


To test this extension a file ${HOME}/.mozilla/firefox/testmozilla/extensions/biography-helper@plindenbaum.com is created. This file contains the path to the XUL folder.
/home/pierre/tmp/XUL/

You can test the extension by invoking firefox with the profile "TEST":
firefox -no-remote -P TEST


When your extension is ready you can package it into a *.xpi archive.
zip -r wikipedia.zip chrome chrome.manifest install.rdf
mv wikipedia.zip wikipedia.xpi


That's it. You can download this extension at http://lindenb.integragen.org/xul/wikipedia.xpi and then open it with firefox which will prompt you if you want to install this extension. Then, edit an article in wikipedia and click the left button to get the new contextual menu.


Pierre

06 September 2007

IBM CoScripter: A system for capturing, sharing, and automating tasks on the Web.

Via O'Reilly Radar:

CoScripter is firefox extension created by IBM. It is a system for recording, automating, and sharing processes performed in a web browser such as printing photos online, requesting a vacation hold for postal mail, or checking bank account information. Instructions for processes are recorded and stored in easy-to-read text here on the CoScripter web site, so anyone can make use of them.

13 July 2007

URL +1, LSID -1

"URL +1, LSID -1" is the name of the current thread on "public-semweb-lifesci":
http://www.mail-archive.com/public-semweb-lifesci@w3.org/index.html#02766
This discussion (worth looking) is about the life science identifier 'LSID) and it was started by Eric Jain:


In the latest release of UniProt (11.3), all URIs of the form:

urn:lsid:uniprot.org:{db}:{id}

have been replaced with URLs:

http://purl.uniprot.org/{db}/{id}

In general, these URLs can be resolved to a human readable web page (a few are still broken, will be fixed). Some of these web pages may (or may not) be linked to a machine-readable representation via link-rel=alternate.

As an optimization for "Semantic Web" crawlers, there is experimental support for "Accept" headers (i.e. set it to "application/rdf+xml").

Some examples:

http://purl.uniprot.org/uniprot/P12345
http://purl.uniprot.org/taxonomy/9606
http://purl.uniprot.org/pdb/1BRC

Among the protagonists we can find Roderic Page, Michel Dumontier, Mark Wilkinson, Alan Ruttenberg, Dany Ayers, etc...


Life Science Identifiers (LSIDs) are persistent, location-independent, resource identifiers for uniquely naming biologically significant resources including species names, concepts, occurrences, genes or proteins, or data objects that encode information about them. To put it simply, LSIDs are a way to identify and locate pieces of biological information on the web.

As far I understand LSID, we all should use lsid:ncbi.nlm.nih.gov:pubmed:12507336 instead of http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=12507336&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum or http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&uid=12507336. (Note that the two later URL are not the same but they point to the same article). An LSID resolver can also be used to find/discover some other (RDF based) properties about your object.

In the thread a firefox extension resolving lSID uri was described: I just installed it on my firefox and it looks nice and the code looks really interesting: it shows how to create a firefox extension which will insert a new handler for a new internet protocol named "lsidres:".


(...)
LsidModule.registerSelf = function (compMgr, location, loaderStr, type){

// http://developer.mozilla.org/xpcom/api/nsIComponentRegistrar/
compMgr = compMgr.QueryInterface(Components.interfaces.nsIComponentRegistrar);
compMgr.registerFactoryLocation(LSIDPROT_HANDLER_CID,
"Protocol handler for LSID",
"@mozilla.org/network/protocol;1?name=lsidres",
location, loaderStr, type);

}
(...)



Then when a hyperlink in a HTML page (such as lsidres:urn:lsid:ubio.org:namebank:11815) is activated, firefox open a new window, calls a remote LSID resolver and displays the properties of your object.

Pierre

10 May 2006

Playing with the connotea API (2/2)

A few monthes ago Ben Lund gave me the opportunity to test a beta-version of the connotea API and I wondered if I was able to build an Annozilla server that could act as a bridge between the firefox web browser and the connotea server allowing scientists to see and share comments about a web site/ a paper. As it is said on the annozilla server:

The Annozilla project is designed to view and create annotations associated with a web page, as defined by the W3C Annotea project. The idea is to store annotations as RDF on a server, using XPointer (or at least XPointer-like constructs) to identify the region of the document being annotated. The intention of Annozilla is to use Mozilla's native facilities to manipulate annotation data - its built-in RDF handling to parse the annotations, and nsIXmlHttpRequest to submit data when creating annotations..

annozilla02
In this example: firefox opened the NCBI home page (right side). Once the page is loaded, annozilla fetches the bookmarks about NCBI from connotea (top left). Double clicking in the annotations makes annozilla download the body of those bookmarks (bottom left).


The JAVA servlet I wrote is available at :



But wait, there is a problem: For "security reasons" Annozilla does not use the "GET" parameters in an URL (I really understand that). So when the following URL is submited:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14870871

Annozilla ignores all the parameters and inserts the comments for:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi



which is really less interesting !!!... Anyway, this was still a nice to write and it was proof of concept on how to use the connotea API.

update: 2010-08-12: source code

/*
Copyright (c) 2006 Pierre Lindenbaum PhD

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
``Software''), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

The name of the authors when specified in the source files shall be
kept unmodified.

THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL 4XT.ORG BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.


$Id: $
$Author: $
$Revision: $
$Date: $
$Locker: $
$RCSfile: $
$Source: $
$State: $
$Name: $
$Log: $


*************************************************************************/
package org.lindenb.annotea.server;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.StreamTokenizer;
import java.io.StringReader;
import java.io.StringWriter;
import java.net.MalformedURLException;
import java.net.Socket;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.security.MessageDigest;
import java.util.Enumeration;
import java.util.HashSet;
import java.util.Iterator;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;


import org.w3c.dom.CDATASection;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;

import com.oreilly.servlet.Base64Decoder;


/**
* @author lindenb
* http://islande:8080/annotea/Annotea
*/
public class AnnoteaServer extends HttpServlet
{
/**
* serialVersionUID
*/
private static final long serialVersionUID = 1L;
/** flag for debugging on/off */
private static boolean DEBUG=false;

//static declaration of xml namespaces
static public final String RDF = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
static public final String RDFS = "http://www.w3.org/2000/01/rdf-schema#";
static public final String DC = "http://purl.org/dc/elements/1.1/";
static public final String XMLNS= "http://www.w3.org/2000/xmlns/";
static public final String AN = "http://www.w3.org/2000/10/annotation-ns#";
static public final String XHTML = "http://www.w3.org/1999/xhtml" ;
static public final String HTTP = "http://www.w3.org/1999/xx/http#";
static public final String CONNOTEA ="http://www.connotea.org/2005/01/schema#";
static public final String TERMS = "http://purl.org/dc/terms/";



/** query parameter name as specified in the spec */
static final String QUERY_ANNOTATION_PARAMETER="w3c_annotates";
/** default number of rows to fetch from http://www.connotea.org */
static final int CONNOTEA_DEFAULT_NUM_ROWS=10;
/** number of rows to fetch from http://www.connotea.org */
int connoteaNumRowsToFetch =CONNOTEA_DEFAULT_NUM_ROWS;



/** @see javax.servlet.GenericServlet#init() */
public void init() throws ServletException
{
//init number of rows
String s = getInitParameter("connoteaNumRowsToFetch");
if(s!=null)
{
try
{
this.connoteaNumRowsToFetch=Integer.parseInt(s);
if(this.connoteaNumRowsToFetch<=0) this.connoteaNumRowsToFetch=CONNOTEA_DEFAULT_NUM_ROWS;
}
catch(Exception err)
{
throw new ServletException(err);
}
}
}


/** convert a string to MD5 */
static String toMD5(String url) throws ServletException
{
StringBuffer result= new StringBuffer();
try
{
MessageDigest md = MessageDigest.getInstance("MD5");

md.update(url.getBytes());
byte[] digest = md.digest();

for (int k=0; k<digest.length; k++)
{
String hex = Integer.toHexString(digest[k]);
if (hex.length() == 1) hex = "0" + hex;
result.append(hex.substring(hex.length()-2));
}
}
catch(Exception err)
{
throw new ServletException(err);
}
return result.toString();
}

/** @see javax.servlet.http.HttpServlet#doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) */
protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
res.setContentType("text/xml");
String urlquery= req.getParameter(QUERY_ANNOTATION_PARAMETER);
PrintWriter out=res.getWriter();

String pathInfo = req.getPathInfo(); // /a/b;c=123

/**
*
* WE FOUND URLQUERY
*
*/
if(urlquery!=null)
{
debug("query is "+urlquery);
String authorizationUTF8= URLEncoder.encode(getAuthorization(req),"UTF-8");
String md5=toMD5(urlquery);
Document doc= getConnoteaRDF(
"http://www.connotea.org/data/bookmarks/uri/"+md5,
req
);

String postCount=null;
String created=null;
if(doc!=null)
{
Element root= doc.getDocumentElement();
if(root==null || !isA(root,RDF,"RDF")) throw new ServletException("Bad XML root from connotea");

for(Node n1= root.getFirstChild();n1!=null;n1=n1.getNextSibling())
{
if(!isA(n1,TERMS,"URI")) continue;
for(Node n2= n1.getFirstChild();n2!=null;n2=n2.getNextSibling())
{
if(isA(n2,CONNOTEA,"postCount"))
{
postCount=textContent(n2).replace('\n',' ').trim();
}
else if(isA(n2,CONNOTEA,"created"))
{
created=textContent(n2).replace('\n',' ').trim();
}
}
break;
}
}

debug("postcount="+postCount);

out.print(
"<?xml version=\"1.0\" ?>\n" +
"<r:RDF xmlns:r=\""+RDF+"\"\n" +
" xmlns:a=\""+AN+"\"\n" +
" xmlns:d=\""+DC+"\">\n"
);
if(postCount!=null)
{
out.print(
" <r:Description r:about=\""+getBaseURL(req)+"/"+md5+"\">\n" +
" <r:type r:resource=\""+ AN +"Annotation\"/>\n" +
" <r:type r:resource=\"http://www.w3.org/2000/10/annotationType#Comment\"/>\n" +
" <a:annotates r:resource=\""+urlquery+"\"/>\n" +
" <d:title>"+postCount+" Annotation(s) of "+urlquery+" on connotea</d:title>\n" +
" <a:context>" + urlquery+"#xpointer(/html[1])</a:context>\n" +
" <d:creator>Connotea</d:creator>\n" +
" <a:created>"+created+"</a:created>\n" +
" <d:date>"+created+"</d:date>\n" +
" <a:body r:resource=\""+getBaseURL(req)+"/body/"+md5+"?authorization="+authorizationUTF8+"\">"+
"</r:Description>"
);
}

out.print("</r:RDF>\n");
}
/**
*
* /BODY/ in pathinfo
*
*/
else if(pathInfo!=null && pathInfo.startsWith("/body/"))
{
String md5=pathInfo.substring(6);
Document doc=getConnoteaRDF("http://www.connotea.org/data/uri/"+md5+
"?num="+this.connoteaNumRowsToFetch,
req);
if(doc==null)
{
throw new ServletException("Cannot get http://www.connotea.org/data/uri/"+md5);
}
Element root= doc.getDocumentElement();
if(root==null || !isA(root,RDF,"RDF")) throw new ServletException("Bad XML root from connotea");


out.print("<?xml version=\"1.0\" ?>\n"+
"<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
);

printHTMLBodyFromConnotea(new PrintWriter(new FileWriter("/tmp/logAnnotea.txt")),md5,root);
printHTMLBodyFromConnotea(out,md5,root);
}
/**
*
* other
*
*/
else
{
debug("pathinfo not handled ?");
out.print(
"<?xml version=\"1.0\" ?>\n" +
"<r:RDF xmlns:r=\""+RDF+"\"\n" +
" xmlns:a=\""+AN+"\"\n" +
" xmlns:d=\""+DC+"\">\n" +
"<a:annotates r:resource=\""+
req.getRequestURL()+ "\"/>"+
"</r:RDF>\n"
);
}


out.flush();
}

/** return and parse an HTML annotation from connotea */
private void printHTMLBodyFromConnotea(PrintWriter out,String md5,Element root)
{

out.print("<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">"+
"<body>");
out.print("<img src=\"http://www.connotea.org/connotealogo.gif\" alt=\"connotea\"/><br/>");

for(Node n1= root.getFirstChild();n1!=null;n1=n1.getNextSibling())
{
if(isA(n1,CONNOTEA,"Post"))
{

String title=null;
String user=null;
String uri=null;
String created=null;
HashSet subject= new HashSet();
for(Node n2= n1.getFirstChild();n2!=null;n2=n2.getNextSibling())
{
if(isA(n2,DC,"creator")) { user=textContent(n2).trim();}
else if(isA(n2,CONNOTEA,"title")) { title=textContent(n2).trim();}
else if(isA(n2,CONNOTEA,"created")) {
created=textContent(n2).trim();
int T=created.indexOf('T');
if(T!=-1) created= created.substring(0,T);
}
else if(isA(n2,DC,"subject")) { subject.add(textContent(n2).trim());}
else if(isA(n2,CONNOTEA,"uri"))
{
for(Node n3= n2.getFirstChild();n3!=null;n3=n3.getNextSibling())
{
if(isA(n3,TERMS,"URI"))
{
Element e3=(Element)n3;
uri=e3.getAttributeNS(RDF,"about").trim();
}
}
}
}
out.print("<div>");

if(title!=null)
{
if(uri!=null) out.print("<h4><a target=\"ext\" title=\""+escape(uri)+"\" href=\""+escape(uri)+"\">");
out.print(""+escape(title));
if(uri!=null) out.print("</a>");
out.print(" (<a target=\"ext\" title=\"http://www.connotea.org/uri/"+md5+"\" href=\"http://www.connotea.org/uri/"+md5+"\">info</a>)");
out.print("</h4>");
}


if(user!=null)
{
out.print("Posted by <a target=\"ext\" href=\"http://www.connotea.org/user/"+user+"\">"+user+"</a>");
if(!subject.isEmpty())
{
out.print(" to");
for (Iterator iter = subject.iterator(); iter.hasNext();)
{
String sbj=escape((String)iter.next());
out.print(" <a target=\"ext\" href=\"http://www.connotea.org/tag/"+sbj+"\">"+sbj+"</a>");

}
}
if(created!=null ) out.print(" on <a target=\"ext\" href=\"http://www.connotea.org/date/"+created+"\">"+created+"</a>");
out.print("<br/>");
}
out.print("</div><hr/>");
}


}
out.print("<a title=\"mailto:plindenbaum@yahoo.fr\" href=\"mailto:plindenbaum@yahoo.fr\">plindenbaum@yahoo.fr</a> Pierre Lindenbaum PhD<br/>");
out.print("<div align=\"center\"><img border=\"1\" src=\"http://www.integragen.com/img/title.png\"/></div>");
out.print("</body></html>");
out.flush();
}


/**
* @see javax.servlet.http.HttpServlet#doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) */
protected void doPost(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
Document doc=null;
try
{
doc= newDocumentBuilder().parse(req.getInputStream());
}
catch(SAXException err)
{
debug("cannot get document from input stream");
throw new ServletException(err);
}

String context=null;
String content=null;
Element root= doc.getDocumentElement();
if(root==null) throw new ServletException("Cannot find root element of input");
debug("DOM1 content is "+DOM2String(root));
for(Node c1= root.getFirstChild();c1!=null;c1=c1.getNextSibling())
{
if(!isA(c1,RDF,"Description")) continue;
for(Node c2= c1.getFirstChild();c2!=null;c2=c2.getNextSibling())
{
if(c2.hasChildNodes()&& isA(c2,AN,"context"))
{
Element e2=(Element)c2;
context=textContent(e2).trim();
int xpointer=context.indexOf("#xpointer");
if(xpointer!=-1) context=context.substring(0,xpointer);
}
else if(c2.hasChildNodes()&& isA(c2,AN,"body"))
{
for(Node c3= c2.getFirstChild();c3!=null;c3=c3.getNextSibling())
{
if(!(c3.hasChildNodes()&& isA(c3,HTTP,"Message"))) continue;

for(Node c4= c3.getFirstChild();c4!=null;c4=c4.getNextSibling())
{
if(!(c4.hasChildNodes()&& isA(c4,HTTP,"Body"))) continue;
content= textContent(c4).trim().replaceAll("[ ]+"," ");
debug("DOM content is "+DOM2String(c4));
}

}
}
}
}
if(context==null)
{
throw new ServletException("Cannot find context in "+root);
}
if(content==null)
{
throw new ServletException("Cannot find content in "+DOM2String(root));
}


/** check if boomarks already exists */
try
{
String usertitle=null;
String comment=null;
StringBuffer description=new StringBuffer();
HashSet tags= new HashSet();

doc= getConnoteaRDF(
"http://www.connotea.org/data/user/"+
getLogin(req)+"/uri/"+
toMD5(context),req
);
if(doc!=null)
{
root= doc.getDocumentElement();
if(root!=null)
{
for(Node n1= root.getFirstChild();n1!=null;n1=n1.getNextSibling())
{
if(isA(n1,CONNOTEA,"Post"))
{
for(Node n2= n1.getFirstChild();n2!=null;n2=n2.getNextSibling())
{
if(isA(n2,CONNOTEA,"title")) { usertitle=textContent(n2).trim();}
else if(isA(n2,DC,"subject")) { tags.add(textContent(n2).trim().toLowerCase());}
else if(isA(n2,CONNOTEA,"description")) { description= new StringBuffer(textContent(n2).trim()+"\n");}
}
break;
}
}
}
debug("existed: title="+usertitle+" subject="+tags+ " desc="+description);
}


/* construct URL to connotea */

BufferedReader buffReader= new BufferedReader(new StringReader(content));
debug("content is "+content);
String line=null;

while((line=buffReader.readLine())!=null)
{
line= line.trim();
String upper= line.toUpperCase();
//parse tags
if(upper.startsWith("TAG:"))
{
try
{
StreamTokenizer parser= new StreamTokenizer(
new StringReader(line.substring(4))
);
parser.quoteChar('"');
parser.quoteChar('\'');
parser.eolIsSignificant(false);
parser.lowerCaseMode(true);
parser.ordinaryChars('0','9');
parser.wordChars('0','9');
parser.whitespaceChars(',',',');
parser.whitespaceChars(';',';');
parser.whitespaceChars('(','(');
parser.whitespaceChars(')',')');

while ( parser.nextToken() != StreamTokenizer.TT_EOF )
{
if ( parser.ttype == StreamTokenizer.TT_WORD)
{
tags.add(parser.sval.toLowerCase());
}
else if ( parser.ttype == StreamTokenizer.TT_NUMBER )
{
//hum... ignoring number
//items.add(""+parser.nval);
}
else if ( parser.ttype == StreamTokenizer.TT_EOL )
{
continue;
}
else if(parser.sval!=null)
{
tags.add(parser.sval.toLowerCase()) ;
}
}
}
catch(Exception ex)
{

}
}
else if(upper.startsWith("TI:"))
{
usertitle= line.substring(3).trim();
if(usertitle.length()==0) usertitle=null;
}
else if(upper.startsWith("COM:"))
{
comment= line.substring(4).trim();
if(comment.length()==0) comment=null;
}
else
{
description.append(line+"\n");
}
}
//put one tag if no tags was declared
if(tags.isEmpty()) tags.add("annoteated");

if(description.length()==0) description= new StringBuffer(context);
//build tags parameter
StringBuffer tagStr= new StringBuffer();
for (Iterator iter = tags.iterator(); iter.hasNext();)
{
if(tagStr.length()!=0) tagStr.append(",");
tagStr.append(iter.next().toString());
}

debug("tagstr="+tagStr);


URL url=new URL("http://www.connotea.org:80/data/add");

String postbody=
"uri=" + URLEncoder.encode(context,"UTF-8")+
(usertitle==null?"":"&usertitle="+URLEncoder.encode(usertitle,"UTF-8"))+
"&description="+URLEncoder.encode(description.toString(),"UTF-8")+
(comment==null?"":"&comment="+URLEncoder.encode(comment,"UTF-8"))+
"&tags=" + URLEncoder.encode(tagStr.toString(),"UTF-8")+
"&annoteaflag=hiBenThisIsPierre"
;

StringBuffer poststring= new StringBuffer();
poststring.append("POST "+url.getFile()+" HTTP/1.1\n");
poststring.append("Host: "+url.getHost()+"\n");
poststring.append("authorization: "+getAuthorization(req)+"\n");
poststring.append("Content-length: "+postbody.length()+"\n");
poststring.append("\n");
poststring.append(postbody.toString());


Socket socket= new Socket(url.getHost(),url.getPort());
InputStream from_server= socket.getInputStream();
PrintWriter to_server= new PrintWriter(
new OutputStreamWriter(socket.getOutputStream()));


to_server.print(poststring.toString());
to_server.flush();


StringBuffer response= new StringBuffer();
int c; while((c=from_server.read())!=-1) { response.append((char)c);}


to_server.close();
from_server.close();
debug("sent "+ poststring+" response is:\n"+response+"\n");
}
catch(Exception err)
{
debug("error "+err);
throw new ServletException(err);
}





{
String msg="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<rdf:RDF xmlns:rdf=\""+RDF+"\"\n" +
" xmlns:a=\""+AN+"\"\n" +
" xmlns:dc=\""+DC+"\">\n" +
" <rdf:Description rdf:about=\""+
getBaseURL(req)+"/"+toMD5(context)
+"\">\n" +
" <dc:creator>connotea user</dc:creator>\n"+
" <a:created>2006-01-31</a:created>\n"+
" <dc:date>2006-01-31</dc:date>\n"+
" <a:annotates rdf:resource=\""+escape(context)+"\"/>\n" +
" <rdf:type rdf:resource=\""+ AN +"Annotation\"/>\n" +
" <rdf:type rdf:resource=\"http://www.w3.org/2000/10/annotationType#Comment\"/>\n" +
//" <a:body rdf:resource=\""+getBaseURL(req)+"/body/"+toMD5(context) +"\"/>\n" +
" </rdf:Description>\n" +
"</rdf:RDF>";



res.setStatus(HttpServletResponse.SC_CREATED);
res.setHeader("Connection","close");
res.setHeader("Pragma","no-cache");
res.setContentType("text/xml");
res.setContentLength(msg.length());



PrintWriter out= res.getWriter();
out.print(msg);
debug("message sent is "+msg);
out.flush();
}
}




/** return the BASE URL of this servet */
private static String getBaseURL(HttpServletRequest req) throws ServletException
{
String scheme = req.getScheme(); // http
String serverName = req.getServerName(); // hostname.com
int serverPort = req.getServerPort(); // 80
String contextPath = req.getContextPath(); // /mywebapp
String servletPath = req.getServletPath(); // /servlet/MyServlet
//String pathInfo = req.getPathInfo(); // /a/b;c=123
//String queryString = req.getQueryString(); // d=789
return scheme+"://"+serverName+":"+serverPort+contextPath+servletPath;
}

/** creates a new namespace aware DocumentBuilder parsing DOM */
private static DocumentBuilder newDocumentBuilder() throws ServletException
{
DocumentBuilderFactory factory= DocumentBuilderFactory.newInstance();
factory.setCoalescing(true);
factory.setNamespaceAware(true);
factory.setExpandEntityReferences(true);
factory.setIgnoringComments(true);
factory.setValidating(false);
try
{
return factory.newDocumentBuilder();
} catch (ParserConfigurationException error)
{
throw new ServletException(error);
}
}

/** simple escape XML function */
public static String escape(String s)
{
if(s==null) return s;
StringBuffer buff= new StringBuffer();
for(int i=0;i< s.length();++i)
{
switch(s.charAt(i))
{
case('\"') : buff.append("&quot;"); break;
case('\'') : buff.append("&apos;"); break;
case('&') : buff.append("&amp;"); break;
case('<') : buff.append("&lt;"); break;
case('>') : buff.append("&gt;"); break;
default: buff.append(s.charAt(i)); break;
}
}
return buff.toString();
}

/** simple test for XML element */
static public boolean isA(Node e,String ns, String localname)
{
if(e==null) return false;
return ns.equals(e.getNamespaceURI()) && e.getLocalName().equals(localname);
}

/** get text content of a DOM node */
static public String textContent(Node node)
{
return textContent(node,new StringBuffer()).toString();
}

/** get text content of a DOM node */
static private StringBuffer textContent(Node node,StringBuffer s)
{
if(node==null) return s;
for(Node c= node.getFirstChild();c!=null;c=c.getNextSibling())
{
if(isA(c,XHTML,"br"))
{
s.append("\n");
}
else if(c.getNodeType()==Node.CDATA_SECTION_NODE)
{
s.append(((CDATASection)c).getNodeValue());
}
else if(c.getNodeType()==Node.TEXT_NODE)
{
s.append(((Text)c).getNodeValue());
}
else
{
textContent(c,s);
}
}
return s;
}


/** download a document from connotea or null if the url was not found */
private Document getConnoteaRDF(String url,HttpServletRequest req) throws ServletException
{
DocumentBuilder builder= newDocumentBuilder();
try
{
return builder.parse(
openSecureURLInputStream(
url,
req
)
);
}
catch (FileNotFoundException e)
{
debug("file not found for "+url+" returning empty rdf document");
return null;
}
catch(Exception err)
{
debug("cannot download :"+url+ " "+err);
throw new ServletException(err);
}
}


/** get login from header authorization */
static private String getLogin(HttpServletRequest req) throws ServletException
{
return getLoginAndPassword(req)[0];
}

/** get login and passord from header authorization */
static private String[] getLoginAndPassword(HttpServletRequest req) throws ServletException
{
String s64= getAuthorization(req);
if(!s64.startsWith("Basic ")) throw new ServletException("header \"authorization\" does not start with \"Basic \" ");
s64=s64.substring(6);
String decoded= Base64Decoder.decode(s64);
int loc= decoded.indexOf(':');
if(loc==-1) throw new ServletException("no \":\" in decoded authorization "+s64);
return new String[]{decoded.substring(0,loc),decoded.substring(loc+1)};
}

/** find parameter authorization in http request */
static String getAuthorization(HttpServletRequest req) throws ServletException
{
String s64= req.getHeader("authorization");
if(s64==null)
{
s64=req.getParameter("authorization");
}
if(s64==null)
{
s64=req.getParameter("Authorization");
}
if(s64==null)
{
Enumeration e= req.getHeaderNames();
StringBuffer headers= new StringBuffer();
while(e.hasMoreElements())
{
String key=(String)e.nextElement();
headers.append(key).append("=").append(req.getHeader(key)).append(";\n");
}
throw new ServletException("no header \"authorization\" was found in\n"+headers);
}
return s64;
}

/** open a URL, filling authorization header */
static private InputStream openSecureURLInputStream(String urlstr,HttpServletRequest req)
throws ServletException,FileNotFoundException
{
URL url=null;
String s64=null;
try
{
s64= getAuthorization(req);
url= new URL(urlstr);
URLConnection uc = url.openConnection();
uc.setRequestProperty("Authorization", s64);
InputStream content = uc.getInputStream();
return content;
}
catch (MalformedURLException e)
{
throw new ServletException(e);
}
catch (FileNotFoundException e)
{
throw e;
}
catch (IOException e)
{
throw new ServletException(e);
}
}


/** quick n dirty debugging function: append the message to "/tmp/logAnnotea.txt" */
static private void debug(Object o)
{
if(!DEBUG) return;
try
{
File f= new File("/tmp/logAnnotea.txt");
PrintWriter pout= new PrintWriter(new FileWriter(f,true));
pout.println("###"+System.currentTimeMillis()+"########");
pout.println(o);
pout.flush();
pout.close();

}
catch (Exception err)
{
err.printStackTrace();
}

}

private String DOM2String(Node doc)throws ServletException
{
StringWriter out= new StringWriter();
printDOM(new PrintWriter(out,true),doc);
return out.toString();
}

/* print DOM document for debugging purpose...*/
private static void printDOM(PrintWriter log,Node doc) throws ServletException
{
Source source = new DOMSource(doc);
Result result = new StreamResult(log);


try
{
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);
}
catch(TransformerException error)
{
error.printStackTrace();
error.printStackTrace(log);
log.flush();
throw new ServletException(error);
}
}


}