29 September 2009

There can be only one


Image via wikipedia



computed with wpsubcat

27 September 2009

Extracting Scientists &SF writers from Wikipedia.


Images via wikipedia
In a recent post on FriendFeed, Christopher Harris asked: do you know of any science fiction writer who is/was also a scientist?. My first approach to automatically retrieve those names, was to use Freebase. For example, the following MQL query retrieves the Scientists and the SF Writers.

[{
"id":null,
"name":null,
"type" : "/people/person",
"a:profession":[{"name":"Scientist"}],
"b:profession":[{"name":"Science-Fiction Writer"}],
"limit":100

}]
The MQL query Editor returned the following result:
{
"code": "/api/status/ok",
"result": [
{
"a:profession": [{
"name": "Scientist"
}],
"b:profession": [{
"name": "Science-fiction writer"
}],
"id": "/en/edward_llewellyn",
"name": "Edward Llewellyn",
"type": "/people/person"
},
{
"a:profession": [{
"name": "Scientist"
}],
"b:profession": [{
"name": "Science-fiction writer"
}],
"id": "/en/konrad_fialkowski",
"name": "Konrad Fiałkowski",
"type": "/people/person"
}
],
"status": "200 OK",
"transaction_id": "cache;cache01.p01.sjc1:8101;2009-09-27T15:58:11Z;0002"
}
Only two persons ! That's not much, because the articles in Wikipedia, as well as in Freebase are classified using a hierarchical Categories (sadly, it is not an acyclic graph), but there is no tool to find the articles matching the sub-categories. So , you'll have to repeat this quety for the "British scientists", the "French Biologists", etc... (by the way, I think wikipedians should not have allowed to mix two distinct kind of categories (e.g. profession and nationality).It messes-up the classification). (do you know if this can be achieved using SPARQL and DBPedia ?)

Then I wrote a java tool extracting the pages having a given WP category using the wikipedia API. This tool, "wpsubcat" is available here: http://code.google.com/p/lindenb/downloads/list and requires BerkeleyDB java Edition in order to store the temporary results. The source code is available here: WPSubCat.

Usage

-debug-level <java.util.logging.Level> default:OFF
-base <url> default:http://en.wikipedia.org
-ns <int> restrict results to the given namespace default:14 (Category)
-db-home BerkeleyDB default directory:/tmp/bdb
-d <integer> max recursion depth default:3

-add <category> add a starting article
OR
(stdin|files) containing articles' titles

Examples


Retrieve all the subClasses of 'Category:Scientists'
java -cp je-3.3.75.jar:wpsubcat.jar org.lindenb.tinytools.WPSubCat \
-add "Category:Scientists" > catscientists.txt

Retrieve all the scientists.
java -cp je-3.3.75.jar:wpsubcat.jar org.lindenb.tinytools.WPSubCat \
-ns 0 -d 0 catscientists.txt > scientists.txt


Result


After a series of 'sort' and 'comm', the result is the following list (in fact, it is underestimated, I've sightly improved the way the sub-categories are retrieved) :

That's it

Pierre

24 September 2009

XSLT+Pubchem Reloaded : XSLT+Pubchem = Processing



In a Previous post I've show how XSLT can be used to transform a NCBI/Pubchem entry into a SVG figure. Here, I've sightly modified the original stylesheet to produce a script for Processing generating a 3D model.
From processing.org: Processing is an open source programming language and environment for people who want to program images, animation, and interactions. (...) It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool. Processing is an alternative to proprietary software tools in the same domain..

The new XSLT stylesheet is available here:

Here the XML record of the Lysergic Acid Diethylamide was transformed with pubchem2processing.xsl and the resulted document was uploaded on www.openprocessing.org.


http://www.openprocessing.org/visuals/?visualID=4768

Keys:
'a','z' : +/- opacity
'q','s': +/- zoom
'w','x': +/- atom


The script

//number of atoms
static final int ATOM_COUNT=49;

//array of X positions
static final float array_x[]=new float[]{-1.0624f,-0.1926f,2.4616f,-2.9068f,0.621f,-0.0035f,0.8984f,-1.5992f,-0.6594f,1.4746f,0.5697f,1.2663f,-1.0406f,1.88f,0.5459f,-1.8248f,2.2196f,0.5148f,1.8335f,1.1364f,-3.7848f,-3.2395f,-4.9351f,-2.5399f,1.5948f,1.5945f,-0.0372f,-2.5309f,-1.1865f,0.1964f,-1.5165f,-0.066f,1.4755f,0.7855f,2.5985f,-0.0002f,2.9867f,2.312f,1.0802f,-4.1574f,-3.2447f,-4.3255f,-2.9262f,-5.5678f,-4.6148f,-5.5631f,-2.7809f,-1.4522f,-2.8311};

//array of Y positions
static final float array_y[]=new float[]{1.9255f,-2.0756f,-4.5862f,1.6411f,-2.2073f,-1.4656f,-3.7309f,-0.2934f,-0.693f,-3.9266f,-1.7401f,-2.9517f,-0.6232f,-3.3703f,-2.4961f,1.1879f,-4.9333f,-0.9572f,-2.6029f,-1.3866f,0.7312f,3.0553f,0.2896f,3.6858f,-1.7188f,-4.1373f,-4.301f,-0.8592f,-0.5986f,-0.0044f,-0.1663f,-2.3585f,-1.9308f,-3.5629f,-5.864f,-0.0001f,-5.1463f,-2.9253f,-0.7582f,1.2483f,-0.137f,3.1734f,3.5585f,-0.4132f,-0.2018f,1.1385f,4.7518f,3.5919f,3.221};

//array of Z positions
static final float array_z[]=new float[]{3.398f,4.9603f,0.2014f,4.7838f,3.7188f,2.4987f,3.4375f,3.9986f,5.1388f,2.0807f,1.162f,1.0834f,2.6488f,-0.0856f,6.1555f,4.0282f,1.5131f,0.0024f,-1.2504f,-1.179f,5.5151f,4.8984f,4.6696f,6.0586f,3.8837f,4.1797f,3.5062f,4.0616f,6.0948f,5.1994f,1.7844f,7.0548f,6.2861f,6.1251f,1.9113f,0.0001f,-0.4557f,-2.1682f,-2.0652f,6.4075f,5.8928f,4.9775f,3.9766f,5.2214f,3.7461f,4.3781f,6.1202f,5.9687f,7.0063};

//array of atom names
static final char array_c[]=new char[]{'o','n','n','n','c','c','c','c','c','c','c','c','c','c','c','c','c','c','c','c','c','c','c','c','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h','h'};

//number of bounds
static final int BOUND_COUNT=52;

//bound start index
static final int bound_start[]=new int[]{0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,6,7,7,7,7,8,8,9,9,10,10,11,12,13,14,14,14,16,17,17,18,18,19,20,20,20,21,21,21,22,22,22,23,23,23};

//bound end index
static final int bound_end[]=new int[]{15,4,8,14,13,16,36,15,20,21,5,6,24,10,12,9,25,26,8,12,15,27,28,29,11,16,11,17,13,30,18,31,32,33,34,19,35,19,37,38,22,39,40,23,41,42,43,44,45,46,47,48};

//Name of this product
static final String title= "3981";


float mid_x=0f;
float mid_y=0f;
float mid_z=0f;
float alpha=150;
float zoom=10f;
float zoomAtom=1.0f;

void setup()
{

size(
500,
500,
P3D);
mid_x= center(array_x);
mid_y= center(array_y);
mid_z= center(array_z);
}

void draw()
{
lights();
background(0);

translate(width / 2, height / 2,0);
rotateY(map(mouseX, 0, width, 0, TWO_PI));
rotateZ(map(mouseY, 0, height, 0, -TWO_PI));
stroke(170, 0, 0);
for(int i=0; i< BOUND_COUNT;++i)
{
line(
xAngstrom(array_x[bound_start[i]] -mid_x ),
xAngstrom(array_y[bound_start[i]] -mid_y),
xAngstrom(array_z[bound_start[i]] -mid_z),
xAngstrom(array_x[bound_end[i]] -mid_x ),
xAngstrom(array_y[bound_end[i]] -mid_y ),
xAngstrom(array_z[bound_end[i]] -mid_z )
);
}
noStroke();
for(int i=0;i < ATOM_COUNT;++i)
{
pushMatrix();
translate(
xAngstrom(array_x[i] -mid_x ),
xAngstrom(array_y[i] -mid_y ),
xAngstrom(array_z[i] -mid_z )
);
fillAtom(array_c[i]);
sphere(zoomAtom*radiusOf(array_c[i]));
popMatrix();
}

}

float xAngstrom(float x)
{
return x*zoom;
}

int radiusOf(char c)
{
switch(c)
{
case 'o':case 'O': return 14;
case 'c':case 'C': return 12;
case 'h':case 'H': return 6;
default: return 10;
}
}

void fillAtom(char c)
{
switch(c)
{
case 'o':case 'O': fill(0,0,200,alpha); break;
case 'c':case 'C': fill(100,100,100,alpha); break;
case 'h':case 'H': fill(200,200,200,alpha); break;
case 'n':case 'N': fill(142,142,0,alpha); break;
case 's':case 'S': fill(142,0,142,alpha); break;
default: fill(142,142,142,alpha);break;
}
}

static float center(final float array[])
{
float t=0f;
for(int i=0;i< array.length;++i)
{
t+=array[i];
}
return t/float(array.length);
}

void keyPressed()
{
final float alphaShift=5;
final float zoomShift=0.5;
final float zoomAtomShift=0.1;
switch(key)
{
case 'a':case 'A': if(alpha-alphaShift >=0) this.alpha-=alphaShift; break;
case 'z':case 'Z': if(alpha+alphaShift <=255) this.alpha+=alphaShift; break;
case 'q':case 'Q': zoom+=zoomShift; break;
case 's':case 'S': if(zoom-zoomShift>0) zoom-=zoomShift; break;
case 'w':case 'W': zoomAtom+=zoomAtomShift; break;
case 'x':case 'X': if(zoomAtom-zoomAtomShift >0) zoomAtom-=zoomAtomShift; break;
default:break;
}
}


That's it
Pierre

23 September 2009

db_sql: the new utility for BerkeleyDB. My Notebook.

The new version of BerkeleyDB 4.8 has been released. This new version of the key/value storage engine comes with a new utility called db_sql
From Oracle: Db_sql is a utility program that translates a schema description written in a SQL Data Definition Language dialect into C code that implements the schema using Berkeley DB. It is intended to provide a quick and easy means of getting started with Berkeley DB for users who are already conversant with SQL. It also introduces a convenient way to express a Berkeley DB schema in a format that is both external to the program that uses it and compatible with relational databases.

On my side, I still use Apache Velocity to generate this kind of code (see this older post )

Let's generate the C-code for a simple database storing some SNPs. The heterozygosity, the flanking sequences and the observed variation will be stored and the #rs-id will be indexed. My first attempt was:

CREATE DATABASE snpDatabase;

CREATE TABLE snp (
name varchar(50) NOT NULL PRIMARY KEY,
rs_id VARCHAR(20) NULL,
avHet float,
seq5 text,
observed text,
seq3 text,
class enum('unknown','single','in-del','het','microsatellite','named','mixed','mnp','insertion','deletion')
);

CREATE INDEX dbsnp_id ON snp(rs_id);


But it seemed that: enums, the TEXT type and the NULL/NOT NULL modifier are not supported. So, my second try was:
CREATE DATABASE snpDatabase;

CREATE TABLE snp (
name varchar(50) PRIMARY KEY,
rs_id varchar(20) NULL ,
avHet float,
seq5 varchar(300),
observed varchar(20),
seq3 varchar(300)
);

CREATE INDEX dbsnp_id ON snp(rs_id);


Invoking db_sql


The following command generates the C code for managing the snps in the database as well as a test. The code contains the structure 'struct _snp_data' describing a SNP, the functions to open/close the database, inserting/removing a _snp_data, serializing/de-serializing the structure
/usr/local/BerkeleyDB.4.8/bin/db_sql -i schema.sql -o snp.c -h snp.h -t snp_test.c


snp.h

/*
* Header file for a Berkeley DB implementation
* generated from SQL DDL by db_sql
*/
#include <sys/types.h>
#include <sys/stat.h>
#include <assert.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include "db.h"

/*
* Array size constants.
*/
#define SNP_DATA_NAME_LENGTH 50
#define SNP_DATA_RS_ID_LENGTH 20
#define SNP_DATA_SEQ5_LENGTH 300
#define SNP_DATA_OBSERVED_LENGTH 20
#define SNP_DATA_SEQ3_LENGTH 300

/*
* Data structures representing the record layouts
*/
typedef struct _snp_data {
char name[SNP_DATA_NAME_LENGTH];
char rs_id[SNP_DATA_RS_ID_LENGTH];
double avHet;
char seq5[SNP_DATA_SEQ5_LENGTH];
char observed[SNP_DATA_OBSERVED_LENGTH];
char seq3[SNP_DATA_SEQ3_LENGTH];
} snp_data;


snp_data snp_data_specimen;

/*
* Macros for the maximum length of the
* records after serialization. This is
* the maximum size of the data that is stored
*/
#define SNP_DATA_SERIALIZED_LENGTH (sizeof(snp_data_specimen.name) + \
sizeof(snp_data_specimen.rs_id) + \
sizeof(snp_data_specimen.avHet) + \
sizeof(snp_data_specimen.seq5) + \
sizeof(snp_data_specimen.observed) + \
sizeof(snp_data_specimen.seq3))

/*
* These typedefs are prototypes for the user-written
* iteration callback functions, which are invoked during
* full iteration and secondary index queries
*/
typedef void (*snp_iteration_callback)(void *, snp_data *);

/*
* The environment creation/initialization function
*/
int create_snpDatabase_env(DB_ENV **envpp);

/*
* Database creation/initialization functions
*/
int create_snp_database(DB_ENV *envp, DB **dbpp);

/*
* Database removal functions
*/
int remove_snp_database(DB_ENV *envp);

/*
* Functions for inserting records by providing
* the full corresponding data structure
*/
int snp_insert_struct(DB *dbp, snp_data *snpp);

/*
* Functions for inserting records by providing
* each field value as a separate argument
*/
int snp_insert_fields(DB * dbp,
char *name,
char *rs_id,
double avHet,
char *seq5,
char *observed,
char *seq3);

/*
* Functions for retrieving records by key
*/
int get_snp_data(DB *dbp, char *snp_key, snp_data *data);

/*
* Functions for deleting records by key
*/
int delete_snp_key(DB *dbp, char *snp_key);

/*
* Functions for doing iterations over
* an entire primary database
*/
int snp_full_iteration(DB *dbp,
snp_iteration_callback user_func,
void *user_data);

/*
* Index creation and removal functions
*/
int create_dbsnp_id_secondary(DB_ENV *envp, DB *dbpp, DB **secondary_dbpp);

int remove_dbsnp_id_index(DB_ENV * envp);

int dbsnp_id_query_iteration(DB *secondary_dbp,
char *dbsnp_id_key,
snp_iteration_callback user_func,
void *user_data);

/*
* This convenience method invokes all of the
* environment and database creation methods necessary
* to initialize the complete BDB environment. It uses
* the global environment and database pointers declared
* below. You may bypass this function and use your own
* environment and database pointers, if you wish.
*/
int initialize_snpDatabase_environment();

extern DB_ENV * snpDatabase_envp;
extern DB *snp_dbp;
extern DB *dbsnp_id_dbp;


snp.c

#include "snp.h"


int create_snpDatabase_env(DB_ENV **envpp)
{
int ret, flags;
char *env_name = "./snpDatabase";

if ((ret = db_env_create(envpp, 0)) != 0) {
fprintf(stderr, "db_env_create: %s", db_strerror(ret));
return 1;
}

(*envpp)->set_errfile(*envpp, stderr);
flags = DB_CREATE | DB_INIT_MPOOL;


if ((ret = (*envpp)->open(*envpp, env_name, flags, 0)) != 0) {
(*envpp)->err(*envpp, ret, "DB_ENV->open: %s", env_name);
return 1;
}
return 0;
}

/*
* These are custom comparator functions for integer keys. They are
* needed to make integers sort well on little-endian architectures,
* such as x86. cf. discussion of btree comparators in 'Getting Started
* with Data Storage' manual.
*/
static int
compare_int(DB *dbp, const DBT *a, const DBT *b)
{
int ai, bi;

memcpy(&ai, a->data, sizeof(int));
memcpy(&bi, b->data, sizeof(int));
return (ai - bi);
}

int
compare_long(DB *dbp, const DBT *a, const DBT *b)
{
long ai, bi;

memcpy(&ai, a->data, sizeof(long));
memcpy(&bi, b->data, sizeof(long));
return (ai - bi);
}

/*
* A generic function for creating and opening a database
*/
int
create_database(DB_ENV *envp,
char *db_name,
DB **dbpp,
int flags,
DBTYPE type,
int moreflags,
int (*comparator)(DB *, const DBT *, const DBT *))
{
int ret;
FILE *errfilep;

if ((ret = db_create(dbpp, envp, 0)) != 0) {
envp->err(envp, ret, "db_create");
return ret;
}

if (moreflags != 0)
(*dbpp)->set_flags(*dbpp, moreflags);

if (comparator != NULL)
(*dbpp)->set_bt_compare(*dbpp, comparator);

envp->get_errfile(envp, &errfilep);
(*dbpp)->set_errfile(*dbpp, errfilep);
if ((ret = (*dbpp)->open(*dbpp, NULL, db_name,
NULL, type, flags, 0644)) != 0) {
(*dbpp)->err(*dbpp, ret, "DB->open: %s", db_name);
return ret;
}

return 0;
}

int create_snp_database(DB_ENV *envp, DB **dbpp)
{
return create_database(envp, "snp.db", dbpp,
DB_CREATE, DB_BTREE, 0, NULL);
}


int remove_snp_database(DB_ENV *envp)
{
return envp->dbremove(envp, NULL, "snp.db", NULL, 0);
}


int serialize_snp_data(snp_data *snpp, char *buffer)
{
size_t len;
char *p;

memset(buffer, 0, SNP_DATA_SERIALIZED_LENGTH);
p = buffer;


len = strlen(snpp->name) + 1;
assert(len <= SNP_DATA_NAME_LENGTH);
memcpy(p, snpp->name, len);
p += len;


len = strlen(snpp->rs_id) + 1;
assert(len <= SNP_DATA_RS_ID_LENGTH);
memcpy(p, snpp->rs_id, len);
p += len;

memcpy(p, &snpp->avHet, sizeof(snpp->avHet));
p += sizeof(snpp->avHet);

len = strlen(snpp->seq5) + 1;
assert(len <= SNP_DATA_SEQ5_LENGTH);
memcpy(p, snpp->seq5, len);
p += len;


len = strlen(snpp->observed) + 1;
assert(len <= SNP_DATA_OBSERVED_LENGTH);
memcpy(p, snpp->observed, len);
p += len;


len = strlen(snpp->seq3) + 1;
assert(len <= SNP_DATA_SEQ3_LENGTH);
memcpy(p, snpp->seq3, len);
p += len;


return p - buffer;
}

void deserialize_snp_data(char *buffer, snp_data *snpp)
{
size_t len;

memset(snpp, 0, sizeof(*snpp));

len = strlen(buffer) + 1;
assert(len <= SNP_DATA_NAME_LENGTH);
memcpy(snpp->name, buffer, len);
buffer += len;


len = strlen(buffer) + 1;
assert(len <= SNP_DATA_RS_ID_LENGTH);
memcpy(snpp->rs_id, buffer, len);
buffer += len;

memcpy(&snpp->avHet, buffer, sizeof(snpp->avHet));
buffer += sizeof(snpp->avHet);

len = strlen(buffer) + 1;
assert(len <= SNP_DATA_SEQ5_LENGTH);
memcpy(snpp->seq5, buffer, len);
buffer += len;


len = strlen(buffer) + 1;
assert(len <= SNP_DATA_OBSERVED_LENGTH);
memcpy(snpp->observed, buffer, len);
buffer += len;


len = strlen(buffer) + 1;
assert(len <= SNP_DATA_SEQ3_LENGTH);
memcpy(snpp->seq3, buffer, len);
buffer += len;

}


int snp_insert_struct( DB *dbp, snp_data *snpp)
{
DBT key_dbt, data_dbt;
char serialized_data[SNP_DATA_SERIALIZED_LENGTH];
int ret, serialized_size;
char *snp_key = snpp->name;

memset(&key_dbt, 0, sizeof(key_dbt));
memset(&data_dbt, 0, sizeof(data_dbt));

key_dbt.data = snp_key;
key_dbt.size = strlen(snp_key) + 1;

serialized_size = serialize_snp_data(snpp, serialized_data);

data_dbt.data = serialized_data;
data_dbt.size = serialized_size;

if ((ret = dbp->put(dbp, NULL, &key_dbt, &data_dbt, 0)) != 0) {
dbp->err(dbp, ret, "Inserting key %d", snp_key);
return -1;
}
return 0;
}

int snp_insert_fields(DB *dbp,
char *name,
char *rs_id,
double avHet,
char *seq5,
char *observed,
char *seq3)
{
snp_data data;
assert(strlen(name) < SNP_DATA_NAME_LENGTH);
strncpy(data.name, name, SNP_DATA_NAME_LENGTH);
assert(strlen(rs_id) < SNP_DATA_RS_ID_LENGTH);
strncpy(data.rs_id, rs_id, SNP_DATA_RS_ID_LENGTH);
data.avHet = avHet;
assert(strlen(seq5) < SNP_DATA_SEQ5_LENGTH);
strncpy(data.seq5, seq5, SNP_DATA_SEQ5_LENGTH);
assert(strlen(observed) < SNP_DATA_OBSERVED_LENGTH);
strncpy(data.observed, observed, SNP_DATA_OBSERVED_LENGTH);
assert(strlen(seq3) < SNP_DATA_SEQ3_LENGTH);
strncpy(data.seq3, seq3, SNP_DATA_SEQ3_LENGTH);
return snp_insert_struct(dbp, &data);
}


int get_snp_data(DB *dbp,
char *snp_key,
snp_data *data)
{
DBT key_dbt, data_dbt;
int ret;
char *canonical_key = snp_key;

memset(&key_dbt, 0, sizeof(key_dbt));
memset(&data_dbt, 0, sizeof(data_dbt));

key_dbt.data = canonical_key;
key_dbt.size = strlen(canonical_key) + 1;

if ((ret = dbp->get(dbp, NULL, &key_dbt, &data_dbt, 0)) != 0) {
dbp->err(dbp, ret, "Retrieving key %d", snp_key);
return ret;
}

assert(data_dbt.size <= SNP_DATA_SERIALIZED_LENGTH);

deserialize_snp_data(data_dbt.data, data);
return 0;
}


int delete_snp_key(DB *dbp, char *snp_key)
{
DBT key_dbt;
int ret;
char *canonical_key = snp_key;

memset(&key_dbt, 0, sizeof(key_dbt));

key_dbt.data = canonical_key;
key_dbt.size = strlen(canonical_key) + 1;

if ((ret = dbp->del(dbp, NULL, &key_dbt, 0)) != 0) {
dbp->err(dbp, ret, "deleting key %d", snp_key);
return ret;
}

return 0;
}


int snp_full_iteration(DB *dbp,
snp_iteration_callback user_func,
void *user_data)
{
DBT key_dbt, data_dbt;
DBC *cursorp;
snp_data deserialized_data;
int ret;

memset(&key_dbt, 0, sizeof(key_dbt));
memset(&data_dbt, 0, sizeof(data_dbt));

if ((ret = dbp->cursor(dbp, NULL, &cursorp, 0)) != 0) {
dbp->err(dbp, ret, "creating cursor");
return ret;
}

while ((ret = cursorp->get(cursorp, &key_dbt, &data_dbt, DB_NEXT)) == 0) {
deserialize_snp_data(data_dbt.data, &deserialized_data);
(*user_func)(user_data, &deserialized_data);
}

if (ret != DB_NOTFOUND) {
dbp->err(dbp, ret, "Full iteration");
cursorp->close(cursorp);
return ret;
}

cursorp->close(cursorp);

return 0;
}

int dbsnp_id_callback(DB *dbp,
const DBT *key_dbt,
const DBT *data_dbt,
DBT *secondary_key_dbt)
{

int ret;
snp_data deserialized_data;

deserialize_snp_data(data_dbt->data, &deserialized_data);

memset(secondary_key_dbt, 0, sizeof(*secondary_key_dbt));
secondary_key_dbt->size = strlen(deserialized_data.rs_id) + 1;
secondary_key_dbt->data = malloc(secondary_key_dbt->size);
memcpy(secondary_key_dbt->data, deserialized_data.rs_id,
secondary_key_dbt->size);

/* tell the caller to free memory referenced by secondary_key_dbt */
secondary_key_dbt->flags = DB_DBT_APPMALLOC;

return 0;
}

int create_dbsnp_id_secondary(DB_ENV *envp,
DB *primary_dbp,
DB **secondary_dbpp)
{
int ret;
char * secondary_name = "dbsnp_id.db";

if ((ret = create_database(envp, secondary_name, secondary_dbpp,
DB_CREATE, DB_BTREE, DB_DUPSORT, NULL)) != 0)
return ret;

if ((ret = primary_dbp->associate(primary_dbp, NULL, *secondary_dbpp,
&dbsnp_id_callback, DB_CREATE)) != 0) {
(*secondary_dbpp)->err(*secondary_dbpp, ret,
"DB->associate: %s.db", secondary_name);
return ret;
}
return 0;
}

int remove_dbsnp_id_index(DB_ENV *envp)
{
return envp->dbremove(envp, NULL, "dbsnp_id.db", NULL, 0);
}

int dbsnp_id_query_iteration(DB *secondary_dbp,
char *dbsnp_id_key,
snp_iteration_callback user_func,
void *user_data)
{
DBT key_dbt, data_dbt;
DBC *cursorp;
snp_data deserialized_data;
int ret;

memset(&key_dbt, 0, sizeof(key_dbt));
memset(&data_dbt, 0, sizeof(data_dbt));

if ((ret = secondary_dbp->cursor(secondary_dbp, NULL, &cursorp, 0)) != 0) {
secondary_dbp->err(secondary_dbp, ret, "creating cursor");
return ret;
}

key_dbt.data = dbsnp_id_key;
key_dbt.size = strlen(dbsnp_id_key) + 1;

for (ret = cursorp->get(cursorp, &key_dbt, &data_dbt, DB_SET);
ret == 0;
ret = cursorp->get(cursorp, &key_dbt, &data_dbt, DB_NEXT_DUP)) {
deserialize_snp_data(data_dbt.data, &deserialized_data);
(*user_func)(user_data, &deserialized_data);
}

if (ret != DB_NOTFOUND) {
secondary_dbp->err(secondary_dbp, ret, "Querying secondary");
return ret;
}

cursorp->close(cursorp);

return 0;
}

DB_ENV * snpDatabase_envp = NULL;
DB *snp_dbp = NULL;
DB *dbsnp_id_dbp = NULL;

int initialize_snpDatabase_environment()
{
if (create_snpDatabase_env(&snpDatabase_envp) != 0)
goto exit_error;

if (create_snp_database(snpDatabase_envp, &snp_dbp) != 0)
goto exit_error;

if (create_dbsnp_id_secondary(snpDatabase_envp, snp_dbp, &dbsnp_id_dbp) != 0)
goto exit_error;

return 0;

exit_error:

fprintf(stderr, "Stopping initialization because of error\n");
return -1;
}

snp_test.c


/*
* Simple test for a Berkeley DB implementation
* generated from SQL DDL by db_sql
*/

#include "snp.h"

/*
* These are the iteration callback functions. One is defined per
* database(table). They are used for both full iterations and for
* secondary index queries. When a retrieval returns multiple records,
* as in full iteration over an entire database, one of these functions
* is called for each record found
*/

void snp_iteration_callback_test(void *msg, snp_data *snp_record)
{
printf("In iteration callback, message is: %s\n", (char *)msg);

printf("snp->name: %s\n", snp_record->name);
printf("snp->rs_id: %s\n", snp_record->rs_id);
printf("snp->avHet: %lf\n", snp_record->avHet);
printf("snp->seq5: %s\n", snp_record->seq5);
printf("snp->observed: %s\n", snp_record->observed);
printf("snp->seq3: %s\n", snp_record->seq3);
}


main(int argc, char **argv)
{
int i;
int ret;

snp_data snp_record;
snp_dbp = NULL;
dbsnp_id_dbp = NULL;

/*
* Use the convenience method to initialize the environment.
* The initializations for each entity and environment can be
* done discretely if you prefer, but this is the easy way.
*/
ret = initialize_snpDatabase_environment();
if (ret != 0){
printf("Initialize error");
return ret;
}

/*
* Now that everything is initialized, insert a single
* record into each database, using the ...insert_fields
* functions. These functions take each field of the
* record as a separate argument
*/
ret = snp_insert_fields( snp_dbp, "ninety-nine", "ninety-nine", 99.5, "ninety-nine", "ninety-nine", "ninety-nine");
if (ret != 0){
printf("Insert error\n");
goto exit_error;
}


/*
* Next, retrieve the records just inserted, looking them up
* by their key values
*/

printf("Retrieval of snp record by key\n");
ret = get_snp_data( snp_dbp, "ninety-nine", &snp_record);

printf("snp.name: %s\n", snp_record.name);
printf("snp.rs_id: %s\n", snp_record.rs_id);
printf("snp.avHet: %lf\n", snp_record.avHet);
printf("snp.seq5: %s\n", snp_record.seq5);
printf("snp.observed: %s\n", snp_record.observed);
printf("snp.seq3: %s\n", snp_record.seq3);
if (ret != 0)
{
printf("Retrieve error\n");
goto exit_error;
}


/*
* Now try iterating over every record, using the ...full_iteration
* functions for each database. For each record found, the
* appropriate ...iteration_callback_test function will be invoked
* (these are defined above).
*/
ret = snp_full_iteration(snp_dbp, &snp_iteration_callback_test,
"retrieval of snp record through full iteration");
if (ret != 0){
printf("Full Iteration Error\n");
goto exit_error;
}


/*
* For the secondary indexes, query for the known keys. This also
* results in the ...iteration_callback_test function's being called
* for each record found.
*/
dbsnp_id_query_iteration(dbsnp_id_dbp, "ninety-nine",
&snp_iteration_callback_test,
"retrieval of snp record through dbsnp_id query");

/*
* Now delete a record from each database using its primary key.
*/
ret = delete_snp_key( snp_dbp, "ninety-nine");
if (ret != 0) {
printf("Delete error\n");
goto exit_error;
}


exit_error:
/*
* Close the secondary index databases
*/
if (dbsnp_id_dbp != NULL)
dbsnp_id_dbp->close(dbsnp_id_dbp, 0);


/*
* Close the primary databases
*/
if (snp_dbp != NULL)
snp_dbp->close(snp_dbp, 0);


/*
* Delete the secondary index databases
*/
remove_dbsnp_id_index(snpDatabase_envp);

/*
* Delete the primary databases
*/
remove_snp_database(snpDatabase_envp);

/*
* Finally, close the environment
*/
snpDatabase_envp->close(snpDatabase_envp, 0);
return ret;
}

Compiling an Running the test


export LD_LIBRARY_PATH=/usr/local/BerkeleyDB.4.8/lib
gcc snp_test.c snp.c -ldb
mkdir snpDatabase
./a.out
Retrieval of snp record by key
snp.name: ninety-nine
snp.rs_id: ninety-nine
snp.avHet: 99.500000
snp.seq5: ninety-nine
snp.observed: ninety-nine
snp.seq3: ninety-nine
In iteration callback, message is: retrieval of snp record through full iteration
snp->name: ninety-nine
snp->rs_id: ninety-nine
snp->avHet: 99.500000
snp->seq5: ninety-nine
snp->observed: ninety-nine
snp->seq3: ninety-nine
In iteration callback, message is: retrieval of snp record through dbsnp_id query
snp->name: ninety-nine
snp->rs_id: ninety-nine
snp->avHet: 99.500000
snp->seq5: ninety-nine
snp->observed: ninety-nine
snp->seq3: ninety-nine



That's it
Pierre

20 September 2009

From FriendFeed to Nucleic Acids Research.

Deepak Singh and Andrew Su have both already posted on their blog about it: I'm proud to be the second author of a paper published in the "Database Issue" of Nucleic Acids Research.

The Gene Wiki: community intelligence applied to human gene annotation

Jon W. Huss III, Pierre Lindenbaum, Michael Martone, Donabel Roberts, Angel Pizarro, Faramarz Valafar, John B. Hogenesch and Andrew I. Su

Nucleic Acids Research, doi:10.1093/nar/gkp760

What I really like about this paper is how the collaboration started: last year Andrew asked for some help on FriendFeed, the Life Scientists:



.. I sent a mail and said I could possibly help , "et voila" !
Citing Andrew: I'd also be remiss if I didn't also note the critical role online collaboration played in this effort. Of the seven coauthors on this paper, two I've met only once in real life, and two I've never met in person. We are spread over four cities, five organizations, and nine time zones. Initiating and executing this collaboration happened virtually entirely online, aided by the FriendFeed Life Scientists room and Molecular and Cellular Biology WikiProject at Wikipedia. It was an eye-opener in terms of how effective online collaboration can be done.

Andrew, thank you again :-)


Pierre

Xalan part 3: BerkeleyDB+XSLT+pubmed

In my previous post I showed how to call mysql from the XALAN XSLT engine. In the current post, I'll show how a custom function for XALAN can return a new DOM/XML document that will be later used by the XSLT stylesheet: To get a source of data, i'm going to create a key-value database with berkeleyDB (Java Edition) storing strings (as the key) and XML document (as the value).

The Database XMLStore


In the constructor, the BerkeleyDB environement is open, a DOM parser to parse the XML is created as well as a Transformer to serialize this XML to String.
DocumentBuilderFactory domFactory= DocumentBuilderFactory.newInstance();
domFactory.setCoalescing(true);
domFactory.setExpandEntityReferences(true);
domFactory.setIgnoringComments(true);
domFactory.setNamespaceAware(false);
domFactory.setValidating(false);
domFactory.setIgnoringElementContentWhitespace(true);
this.docBuilder= domFactory.newDocumentBuilder();

TransformerFactory tFactory=TransformerFactory.newInstance();
this.xmlSerializer=tFactory.newTransformer();
EnvironmentConfig envCfg= new EnvironmentConfig();
envCfg.setAllowCreate(true);
envCfg.setReadOnly(false);
this.env=new Environment(new File(envHome), envCfg);
DatabaseConfig cfg= new DatabaseConfig();
cfg.setAllowCreate(true);
cfg.setReadOnly(false);
this.id2xml= env.openDatabase(null, "id2xml", cfg);

The class XMLStore contains a method to PUT the XML/DOM document in the database.
public OperationStatus put(String id,Document dom) throws DatabaseException
{
DatabaseEntry key=new DatabaseEntry();
DatabaseEntry data=new DatabaseEntry();
StringBinding.stringToEntry(id, key);
StringWriter w= new StringWriter();
try
{
this.xmlSerializer.transform(
new DOMSource(dom),
new StreamResult(w)
);
}
catch (TransformerException e)
{
throw new DatabaseException(e);
}
StringBinding.stringToEntry(w.toString(), data);
return this.id2xml.put(null, key, data);
}

We also need a GET method to retrieve a XML document from a given key. This will be the new document processed by the XSLT stylesheet
public Document get(String id) throws DatabaseException
{
Document dom= this.docBuilder.newDocument();
Element root= dom.createElement("Query");
root.setAttribute("key", String.valueOf(id));
dom.appendChild(root);

if(id==null)
{
root.setAttribute("status", "failure");
root.appendChild(dom.createTextNode("key is null"));
return dom;
}

DatabaseEntry key=new DatabaseEntry();
DatabaseEntry data=new DatabaseEntry();
StringBinding.stringToEntry(id, key);
if(this.id2xml.get(null, key, data, LockMode.DEFAULT)!=OperationStatus.SUCCESS)
{
root.setAttribute("status", "failure");
root.appendChild(dom.createTextNode("key not found"));
return dom;
}
try
{
Document doc = this.docBuilder.parse(new InputSource(new StringReader(StringBinding.entryToString(data))));
root.setAttribute("status", "success");
root.appendChild(dom.importNode(doc.getDocumentElement(),true));
return dom;
}
catch (Exception e)
{
throw new DatabaseException(e);
}
}

Full Source code of XMLStore.java

package test;
import java.io.File;
import java.io.StringReader;
import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;


import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.xml.sax.InputSource;


import com.sleepycat.bind.tuple.StringBinding;
import com.sleepycat.je.Database;
import com.sleepycat.je.DatabaseConfig;
import com.sleepycat.je.DatabaseEntry;
import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.je.LockMode;
import com.sleepycat.je.OperationStatus;


public class XMLStore
{
/** BerkeleyDB environement */
private Environment env;
/** Database mapping String to DOM */
private Database id2xml;
/** DOM builder */
private DocumentBuilder docBuilder;
/** DOM to String factory */
private Transformer xmlSerializer;

public XMLStore(String envHome) throws Exception
{
DocumentBuilderFactory domFactory= DocumentBuilderFactory.newInstance();
domFactory.setCoalescing(true);
domFactory.setExpandEntityReferences(true);
domFactory.setIgnoringComments(true);
domFactory.setNamespaceAware(false);
domFactory.setValidating(false);
domFactory.setIgnoringElementContentWhitespace(true);
this.docBuilder= domFactory.newDocumentBuilder();

TransformerFactory tFactory=TransformerFactory.newInstance();
this.xmlSerializer=tFactory.newTransformer();

EnvironmentConfig envCfg= new EnvironmentConfig();
envCfg.setAllowCreate(true);
envCfg.setReadOnly(false);
this.env=new Environment(new File(envHome), envCfg);
DatabaseConfig cfg= new DatabaseConfig();
cfg.setAllowCreate(true);
cfg.setReadOnly(false);
this.id2xml= env.openDatabase(null, "id2xml", cfg);
}

public void close()
{
try {
id2xml.close();
env.close();
} catch(DatabaseException err) {}
}

public Document get(String id) throws DatabaseException
{
Document dom= this.docBuilder.newDocument();
Element root= dom.createElement("Query");
root.setAttribute("key", String.valueOf(id));
dom.appendChild(root);

if(id==null)
{
root.setAttribute("status", "failure");
root.appendChild(dom.createTextNode("key is null"));
return dom;
}

DatabaseEntry key=new DatabaseEntry();
DatabaseEntry data=new DatabaseEntry();
StringBinding.stringToEntry(id, key);
if(this.id2xml.get(null, key, data, LockMode.DEFAULT)!=OperationStatus.SUCCESS)
{
root.setAttribute("status", "failure");
root.appendChild(dom.createTextNode("key not found"));
return dom;
}
try
{
Document doc = this.docBuilder.parse(new InputSource(new StringReader(StringBinding.entryToString(data))));
root.setAttribute("status", "success");
root.appendChild(dom.importNode(doc.getDocumentElement(),true));
return dom;
}
catch (Exception e)
{
throw new DatabaseException(e);
}
}

public OperationStatus put(String id,Document dom) throws DatabaseException
{
DatabaseEntry key=new DatabaseEntry();
DatabaseEntry data=new DatabaseEntry();
StringBinding.stringToEntry(id, key);
StringWriter w= new StringWriter();
try
{
this.xmlSerializer.transform(
new DOMSource(dom),
new StreamResult(w)
);
}
catch (TransformerException e)
{
throw new DatabaseException(e);
}
StringBinding.stringToEntry(w.toString(), data);
return this.id2xml.put(null, key, data);
}

public static void main(String[] args) {
XMLStore store=null;
try
{
String dbHome=null;
int optind=0;
while(optind< args.length)
{
if(args[optind].equals("-h"))
{
System.err.println("-D berkeleyDB home");
return;
}
else if(args[optind].equals("-D"))
{
dbHome= args[++optind];
}
else if(args[optind].equals("--"))
{
optind++;
break;
}
else if(args[optind].startsWith("-"))
{
System.err.println("Unknown option "+args[optind]);
}
else
{
break;
}
++optind;
}
if(dbHome==null)
{
System.err.println("-D missing");
return;
}
store= new XMLStore(dbHome);
int nargs= args.length - optind;
if(nargs==3 &&
args[optind].equals("put"))
{
Document dom= store.docBuilder.parse(new InputSource(
new StringReader(args[optind+2])));
OperationStatus status=store.put(args[optind+1], dom);
System.out.println("put \""+args[optind+1]+"\":"+status);
}
else if(nargs==3 &&
args[optind].equals("put-file"))
{
Document dom= store.docBuilder.parse(new File(args[optind+2]));
OperationStatus status=store.put(args[optind+1], dom);
System.out.println("put-file \""+args[optind+1]+"\":"+status);
}
else if(nargs==2 &&
args[optind].equals("get"))
{
Document dom= store.get(args[optind+1]);
store.xmlSerializer.transform(new DOMSource(dom),
new StreamResult(System.out))
;
}
else
{
System.err.println("Illegal arguments.");
}

}
catch(Throwable err)
{
err.printStackTrace();
}
finally
{
if(store!=null) store.close();
}
}
}

Compile & Package

javac -cp je-3.3.75.jar test/XMLStore.java
jar cvf xmlstore.jar test

Test


The articles pubmed id 15677533 and 18398438 were downloaded. Those document are put in the database.
java -cp je-3.3.75.jar:xmlstore.jar test.XMLStore -D /tmp/bdb put-file 15677533 pubmed_15677533.xml
put-file "15677533":OperationStatus.SUCCESS
java -cp je-3.3.75.jar:xmlstore.jar test.XMLStore -D /tmp/bdb put-file 18398438 pubmed_18398438.xml
put-file "18398438":OperationStatus.SUCCESS

Let's retrieve the document id "18398438"
java -cp je-3.3.75.jar:xmlstore.jar test.XMLStore -D /tmp/bdb get 18398438

<?xml version="1.0" encoding="UTF-8" standalone="no"?
><Query key="18398438" status="success"><PubmedArticleSet>
<PubmedArticle><MedlineCitation Owner="NLM" Status="MEDLINE">
(...)</PubmedArticle></PubmedArticleSet></Query>

The styleseet


We're going to process the following XML NCBI/ELink document: for one SNP, it contains a list of the PMIDs of the associated papers.
<eLinkResult>
<LinkSet>
<DbFrom>snp</DbFrom>
<IdList>
<Id>1802710</Id>
</IdList>
<LinkSetDb>
<DbTo>pubmed</DbTo>
<LinkName>snp_pubmed</LinkName>
<Link>
<Id>18398438</Id>
</Link>
<Link>
<Id>15677533</Id>
</Link>
<Link>
<Id>15010842</Id>
</Link>
</LinkSetDb>
</LinkSet>
</eLinkResult>

In the header of the stylesheet, the use of XMLStore is declared:
<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'
xmlns:xstore="xalan://test.XMLStore"
extension-element-prefixes="xstore"
>
A new XMLStore is created. It stores its data in a directory called "/tmp/bdbd". This database will be closed at the end of the processing.
<xsl:param name="directory" select="'/tmp/bdb'"/>
<xsl:variable name="db" select="xstore:new($directory)"/>

<xsl:template match="/eLinkResult">
<html><body>
<xsl:apply-templates select="LinkSet"/>
</body></html>
<xsl:value-of select="xstore:close($db)"/>
</xsl:template>
And each time a PMID is seen, the XMLStore is called, a new XML/Document is returned by XMLStore, and the title of the paper is extracted from this new document.
<xsl:variable name="result" select="xstore:get($db,.)"/>
<li>
<b>Pubmed Id <xsl:value-of select="."/></b>:
<xsl:choose>
<xsl:when test="$result/Query/@status='success'">
<xsl:value-of select="$result/Query/PubmedArticleSet/PubmedArticle/MedlineCitation/Article/ArticleTitle"/>.
</xsl:when>
<xsl:otherwise>
<span style="color:red;"><xsl:value-of select="$result/Query"/></span>
</xsl:otherwise>
</xsl:choose>
</li>

Full source code of the stylesheet

<xsl:stylesheet version="1.0" extension-element-prefixes="xstore"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:xstore="xalan://test.XMLStore"
>

<xsl:output method="xml" indent="yes"/>
<xsl:param name="directory" select="'/tmp/bdb'"/>
<xsl:variable name="db" select="xstore:new($directory)"/>

<xsl:template match="/eLinkResult">
<html><body>
<xsl:apply-templates select="LinkSet"/>
</body></html>
<xsl:value-of select="xstore:close($db)"/>
</xsl:template>

<xsl:template match="LinkSet">
<h1><xsl:value-of select="DbFrom"/></h1>
<ul>
<xsl:for-each select="IdList/Id">
<li><xsl:value-of select="."/></li>
</xsl:for-each>
</ul>
<h2>Related Pubmed</h2>
<ul>
<xsl:for-each select="LinkSetDb[DbTo='pubmed']/Link/Id">
<xsl:variable name="result" select="xstore:get($db,.)"/>
<li>
<b>Pubmed Id <xsl:value-of select="."/></b>:
<xsl:choose>
<xsl:when test="$result/Query/@status='success'">
<xsl:value-of select="$result/Query/PubmedArticleSet/PubmedArticle/MedlineCitation/Article/ArticleTitle"/>.
</xsl:when>
<xsl:otherwise>
<span style="color:red;"><xsl:value-of select="$result/Query"/></span>
</xsl:otherwise>
</xsl:choose>
</li>
</xsl:for-each>
</ul>
</xsl:template>
</xsl:stylesheet>

Running the stylesheet


java -cp ${XALAN}/org.apache.xalan_2.7.1.v200905122109.jar:\
${XALAN}/org.apache.xml.serializer_2.7.1.v200902170519.jar:\
je-3.3.75.jar:\
xmlstore.jar \
org.apache.xalan.xslt.Process -IN elink.fcgi.xml -XSL elink2html.xsl

Result



snp

  • 1802710

Related Pubmed


  • Pubmed Id 18398438:Preferential reciprocal transfer of paternal/maternal DLK1 alleles to obese children: first evidence of polar overdominance in humans..
  • Pubmed Id 15677533:Imprinting, expression, and localisation of DLK1 in Wilms tumours.
  • Pubmed Id 15010842:key not found



That's it

Pierre

19 September 2009

XSLT+MySQL=Append GeneOntology terms to TinySeq

In my previous post I showed how to add a new functions to the Xalan XSLT engine. In this post I'll show how to connect to a mysql server via Xalan. A TinySeq XML will be transformed with XALAN and a XSLT stylesheet querying the GeneOntology public mysql server. This stylesheet will search the GO terms for each sequence.

The TinySeq Sequences

The sequences were downloaded from the NCBI.
<TSeqSet>
<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>124617</TSeq_gi>
<TSeq_accver>P01308.1</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Insulin; Contains: RecName: Full=Insulin B chain; Contains: RecName: Full=Insulin A chain; Flags: Precursor</TSeq_defline>
<TSeq_length>110</TSeq_length>
<TSeq_sequence>MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN</TSeq_sequence>
</TSeq>

<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>3183544</TSeq_gi>
<TSeq_accver>P11940.2</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Polyadenylate-binding protein 1; Short=Poly(A)-binding protein 1; Short=PABP 1</TSeq_defline>
<TSeq_length>636</TSeq_length>
<TSeq_sequence>MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQPAPPSGYFMAAIPQTQNRAAYYPPSQIAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRPPFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAATPAVRTVPQYKYAAGVRNPQQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGKITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV</TSeq_sequence>
</TSeq>
</TSeqSet>

The stylesheet


In the header, the mysql extension for XALAN is declared:
<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'
xmlns:sql="org.apache.xalan.lib.sql.XConnection"
extension-element-prefixes="sql"
>

A few variables are required to define the mysql connection:

<xsl:param name="driver" select="'com.mysql.jdbc.Driver'"/>
<xsl:param name="datasource" select="'jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest'"/>
<xsl:param name="query" select="'SELECT * FROM chromInfo limit 10'"/>
<xsl:param name="passwd" select="'amigo'"/>
<xsl:param name="username" select="'go_select'"/>

A new SQL object is created:
<xsl:variable name="db" select="sql:new()"/>

A new connection is created when the document is parsed.This connection is released at the end.
<xsl:template match="/">
<xsl:if test="not(sql:connect($db, $driver, $datasource, $username, $passwd))" >
<xsl:copy-of select="sql:getError($db)/ext-error" />
<xsl:message terminate="yes">Error Connecting to the Database</xsl:message>
</xsl:if>
<xsl:apply-templates/>
<xsl:value-of select="sql:close($db)"/>
</xsl:template>

Each time a TSeq is found, a new SQL query is built. I'm not a specialist of GO, I hope the query is OK.
<xsl:variable name="sql">
select distinct
term.acc as "termAcc",
term.name as "termName",
term.term_type as "termType"
from
dbxref,
term,
association,
gene_product,
species
where
association.term_id=term.id and
gene_product.dbxref_id=dbxref.id and
gene_product.id=association.gene_product_id and
gene_product.species_id=species.id and
term.is_obsolete=0 and
dbxref.xref_key="<xsl:value-of select="$xref_key"/>" and
species.ncbi_taxa_id=<xsl:value-of select="TSeq_taxid"/>
</xsl:variable>

The query is sent to the mysql server.
<xsl:variable name="table" select='sql:query($db, $sql)'/>

And the SQL result is processed as a regular stylesheet
<xsl:apply-templates select="$table" mode="sql"/>

Complete source code for the stylesheet:
<xsl:stylesheet version="1.0" extension-element-prefixes="sql"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:sql="org.apache.xalan.lib.sql.XConnection"
>

<xsl:output method="xml" indent="yes"/>

<xsl:param name="driver" select="'com.mysql.jdbc.Driver'"/>
<xsl:param name="datasource" select="'jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest'"/>
<xsl:param name="query" select="'SELECT * FROM chromInfo limit 10'"/>
<xsl:param name="passwd" select="'amigo'"/>
<xsl:param name="username" select="'go_select'"/>

<xsl:variable name="db" select="sql:new()"/>



<xsl:template match="/">
<xsl:if test="not(sql:connect($db, $driver, $datasource, $username, $passwd))">
<xsl:copy-of select="sql:getError($db)/ext-error"/>
<xsl:message terminate="yes">Error Connecting to the Database</xsl:message>
</xsl:if>
<xsl:apply-templates/>
<xsl:value-of select="sql:close($db)"/>
</xsl:template>

<xsl:template match="TSeq">
<xsl:element name="TSeq">
<xsl:apply-templates/>
<xsl:if test="TSeq_seqtype/@value='protein' and TSeq_accver and TSeq_taxid">
<xsl:variable name="xref_key">
<xsl:choose>
<xsl:when test="contains(TSeq_accver,'.')">
<xsl:value-of select="substring-before(TSeq_accver,'.')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="TSeq_accver"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="sql">
select distinct
term.acc as "termAcc",
term.name as "termName",
term.term_type as "termType"
from
dbxref,
term,
association,
gene_product,
species
where
association.term_id=term.id and
gene_product.dbxref_id=dbxref.id and
gene_product.id=association.gene_product_id and
gene_product.species_id=species.id and
term.is_obsolete=0 and
dbxref.xref_key="
<xsl:value-of select="$xref_key"/>" and
species.ncbi_taxa_id=
<xsl:value-of select="TSeq_taxid"/>
</xsl:variable>

<xsl:variable name="table" select="sql:query($db, $sql)"/>

<xsl:apply-templates select="$table" mode="sql"/>
</xsl:if>
</xsl:element>
</xsl:template>


<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>



<xsl:template match="sql" mode="sql">
<xsl:if test="count(row-set)>0">
<GeneOntology>
<xsl:apply-templates select="row-set/row" mode="sql"/>
</GeneOntology>
</xsl:if>
</xsl:template>

<xsl:template match="row" mode="sql">
<xsl:element name="GoTerm">
<xsl:attribute name="src">http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=<xsl:value-of select="col[@column-label='termAcc']"/></xsl:attribute>
<acn><xsl:value-of select="col[@column-label='termAcc']"/></acn>
<name><xsl:value-of select="col[@column-label='termName']"/></name>
<type><xsl:value-of select="col[@column-label='termType']"/></type>
</xsl:element>
</xsl:template>

</xsl:stylesheet>


Applying the stylesheet


The jar containing the Mysql driver is added to the CLASSPATH
java -cp ${XALAN}/org.apache.xalan_2.7.1.v200905122109.jar:\
${XALAN}//org.apache.xml.serializer_2.7.1.v200902170519.jar:\
mysql-connector-java.jar \
org.apache.xalan.xslt.Process -IN sequences.fasta.xml -XSL tinyseq2go.xsl

Result


<TSeqSet>
<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>124617</TSeq_gi>
<TSeq_accver>P01308.1</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Insulin; Contains: RecName: Full=Insulin B chain; Contains: RecName: Full=Insulin A chain; Flags: Precursor</TSeq_defline>
<TSeq_length>110</TSeq_length>
<TSeq_sequence>MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN</TSeq_sequence>
<GeneOntology>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045721">
<acn>GO:0045721</acn>
<name>negative regulation of gluconeogenesis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0030307">
<acn>GO:0030307</acn>
<name>positive regulation of cell growth</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045597">
<acn>GO:0045597</acn>
<name>positive regulation of cell differentiation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046889">
<acn>GO:0046889</acn>
<name>positive regulation of lipid biosynthetic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050995">
<acn>GO:0050995</acn>
<name>negative regulation of lipid catabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045725">
<acn>GO:0045725</acn>
<name>positive regulation of glycogen biosynthetic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0032148">
<acn>GO:0032148</acn>
<name>activation of protein kinase B activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050731">
<acn>GO:0050731</acn>
<name>positive regulation of peptidyl-tyrosine phosphorylation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0031954">
<acn>GO:0031954</acn>
<name>positive regulation of protein amino acid autophosphorylation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006469">
<acn>GO:0006469</acn>
<name>negative regulation of protein kinase activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0043066">
<acn>GO:0043066</acn>
<name>negative regulation of apoptosis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008284">
<acn>GO:0008284</acn>
<name>positive regulation of cell proliferation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0051897">
<acn>GO:0051897</acn>
<name>positive regulation of protein kinase B signaling cascade</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0030335">
<acn>GO:0030335</acn>
<name>positive regulation of cell migration</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005615">
<acn>GO:0005615</acn>
<name>extracellular space</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045922">
<acn>GO:0045922</acn>
<name>negative regulation of fatty acid metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0042060">
<acn>GO:0042060</acn>
<name>wound healing</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0022898">
<acn>GO:0022898</acn>
<name>regulation of transmembrane transporter activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046326">
<acn>GO:0046326</acn>
<name>positive regulation of glucose import</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0007186">
<acn>GO:0007186</acn>
<name>G-protein coupled receptor protein signaling pathway</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0032583">
<acn>GO:0032583</acn>
<name>regulation of gene-specific transcription</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0060266">
<acn>GO:0060266</acn>
<name>negative regulation of respiratory burst during acute inflammatory response</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006521">
<acn>GO:0006521</acn>
<name>regulation of cellular amino acid metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0032270">
<acn>GO:0032270</acn>
<name>positive regulation of cellular protein metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045861">
<acn>GO:0045861</acn>
<name>negative regulation of proteolysis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006953">
<acn>GO:0006953</acn>
<name>acute-phase response</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050709">
<acn>GO:0050709</acn>
<name>negative regulation of protein secretion</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0033861">
<acn>GO:0033861</acn>
<name>negative regulation of NAD(P)H oxidase activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0055089">
<acn>GO:0055089</acn>
<name>fatty acid homeostasis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005159">
<acn>GO:0005159</acn>
<name>insulin-like growth factor receptor binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0014068">
<acn>GO:0014068</acn>
<name>positive regulation of phosphoinositide 3-kinase cascade</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045740">
<acn>GO:0045740</acn>
<name>positive regulation of DNA replication</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045821">
<acn>GO:0045821</acn>
<name>positive regulation of glycolysis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046628">
<acn>GO:0046628</acn>
<name>positive regulation of insulin receptor signaling pathway</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0042593">
<acn>GO:0042593</acn>
<name>glucose homeostasis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045818">
<acn>GO:0045818</acn>
<name>negative regulation of glycogen catabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0070201">
<acn>GO:0070201</acn>
<name>regulation of establishment of protein localization</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045840">
<acn>GO:0045840</acn>
<name>positive regulation of mitosis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0043410">
<acn>GO:0043410</acn>
<name>positive regulation of MAPKKK cascade</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006006">
<acn>GO:0006006</acn>
<name>glucose metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005975">
<acn>GO:0005975</acn>
<name>carbohydrate metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046631">
<acn>GO:0046631</acn>
<name>alpha-beta T cell activation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008219">
<acn>GO:0008219</acn>
<name>cell death</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0015758">
<acn>GO:0015758</acn>
<name>glucose transport</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050715">
<acn>GO:0050715</acn>
<name>positive regulation of cytokine secretion</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045909">
<acn>GO:0045909</acn>
<name>positive regulation of vasodilation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045429">
<acn>GO:0045429</acn>
<name>positive regulation of nitric oxide biosynthetic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0051000">
<acn>GO:0051000</acn>
<name>positive regulation of nitric-oxide synthase activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045908">
<acn>GO:0045908</acn>
<name>negative regulation of vasodilation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005179">
<acn>GO:0005179</acn>
<name>hormone activity</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005158">
<acn>GO:0005158</acn>
<name>insulin receptor binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005576">
<acn>GO:0005576</acn>
<name>extracellular region</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005515">
<acn>GO:0005515</acn>
<name>protein binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005520">
<acn>GO:0005520</acn>
<name>insulin-like growth factor binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0007267">
<acn>GO:0007267</acn>
<name>cell-cell signaling</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0060267">
<acn>GO:0060267</acn>
<name>positive regulation of respiratory burst</name>
<type>biological_process</type>
</GoTerm>
</GeneOntology>

</TSeq>

<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>3183544</TSeq_gi>
<TSeq_accver>P11940.2</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Polyadenylate-binding protein 1; Short=Poly(A)-binding protein 1; Short=PABP 1</TSeq_defline>
<TSeq_length>636</TSeq_length>
<TSeq_sequence>MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQPAPPSGYFMAAIPQTQNRAAYYPPSQIAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRPPFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAATPAVRTVPQYKYAAGVRNPQQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGKITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV</TSeq_sequence>
<GeneOntology>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0000166">
<acn>GO:0000166</acn>
<name>nucleotide binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0003723">
<acn>GO:0003723</acn>
<name>RNA binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005829">
<acn>GO:0005829</acn>
<name>cytosol</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005634">
<acn>GO:0005634</acn>
<name>nucleus</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005681">
<acn>GO:0005681</acn>
<name>spliceosomal complex</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008380">
<acn>GO:0008380</acn>
<name>RNA splicing</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008143">
<acn>GO:0008143</acn>
<name>poly(A) RNA binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006378">
<acn>GO:0006378</acn>
<name>mRNA polyadenylation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005737">
<acn>GO:0005737</acn>
<name>cytoplasm</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0048255">
<acn>GO:0048255</acn>
<name>mRNA stabilization</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008494">
<acn>GO:0008494</acn>
<name>translation activator activity</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008022">
<acn>GO:0008022</acn>
<name>protein C-terminus binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006397">
<acn>GO:0006397</acn>
<name>mRNA processing</name>
<type>biological_process</type>
</GoTerm>
</GeneOntology>
</TSeq>
</TSeqSet>


Hey, I think it's cool ! :-)

That's it
Pierre