16 May 2008

Twitter m'a tuer

Just like Paweł Szczęsny ( on http://freelancingscience.com), I'm using less and less this blog favor of twitter, especially for the short posts .

For example, yesterday I sent this information on twitter:

.

I'm also starting using friendfeed.


Image found here


Pierre

14 April 2008

Bio-Twitters, Unite !

If your a scientist, a bioinformatician, etc... join the scientific community of the biotwitters on http://twitter.com. Follow @biotecher to find all the biotwitters in one place (Thanks Attila !) and follow me on @yokofakun.

If you don't know twitter, here is a short video about it:


Pierre

04 April 2008

BerkeleyDB and Hapmap: My notebook.

I'm currently trying to find the best way to store some genotypes. For example I need to store 278.766.958 illumina genotypes (marker,individual, allele1, allele2) and mysql, even with indexes, is getting slow when I'm looking for the Mendelian incompatibilities. Deepak suggested via twitter to use href="http://hdf.ncsa.uiuc.edu/HDF5/">HDF5 but as far as I understand the documentation, HDF5 is "just" a smarter implementation of the C functions fseek/fread/fwrite.

I've been looking at the java implementation of the BerkeleyDB (BDB) just to watch its performance according to my needs. This engine can be used as en embedded database and doesn't need a database server running as a daemon (just like JavaDB). A BDB berkeley database contains records where each record contains a "key" and a "value" (a kind of two columns database, or a kind of C++ std::multimap). In the current post, I'll describe a program which 1) download some genotypes from HapMap 2) Find the pedigree of the samples 3) Loop over the markers (ordered on their position on the genome) and get the frequency 4) find Mendelian incompatibilities

The source code is available on pastie at:



First we need a few classes:

class Position holds a position on a chromosome
private static class Position
{
String chromosome;
int position;
Position(String chromosome,int position)
{
this.chromosome=chromosome;
this.position=position;
}
}


class Marker contains the informations about a snp :
private static class Marker
{
int rsId;
String alleles;
Position pos;
char strand;
}


class Individual contains the informations about an individual including his 'id' on corriel ( see http://cardiogenomics.med.harvard.edu)
private static class Individual
{
/**Coriell Repository Number*/
String name;
/** id in corriel */
String individualID=null;
/** father id */
String fatherID=null;
/** mother id */
String motherID=null;
}


As BDB is a collection of pairs(key,value) we need a class MarkerIndividual holding the pair(marker,individual) to store the genotypes with f(pair(marker,individual)=genotype.
private static class MarkerIndividual
{
private int rsId;
/**Coriell Repository Number*/
private String individualId;
}


At the end, we need a class Genotype to store two alleles:
private static class Genotype
{
char a1,a2;

public boolean same(char a1,char a2)
{
return (this.a1==a1 && this.a2==a2) ||
(this.a1==a2 && this.a2==a1)
;
}
}


BDB has several alternatives to read and write the java objects from/to the databases, this operation requires the object to be converted into an array of bytes: 1) the java Serialization can be used, 2) a TupleBinding can be implemented, this class must impements two functions which are used to encode/decode the object to/from an array of bytes. I've choosen to use this later option, and here is for example the TupleBinding implementation for the class Individual:

private TupleBinding individualTupleBinding=new TupleBinding()
{
@Override
public Object entryToObject(TupleInput input)
{
Individual indi= new Individual();
indi.name= input.readString();
indi.individualID= input.readString();
indi.fatherID=input.readString();
indi.motherID= input.readString();
return indi;
}

@Override
public void objectToEntry(Object object, TupleOutput output) {
Individual indi= Individual.class.cast(object);
output.writeString(indi.name);
output.writeString(indi.individualID);
output.writeString(indi.fatherID);
output.writeString(indi.motherID);
}
};


To open and create an Berkeley Environment the following code was written:
EnvironmentConfig envConf= new EnvironmentConfig();
envConf.setAllowCreate(!readOnly);
envConf.setReadOnly(readOnly);
envConf.setTransactional(false);
this.env= new Environment(
file,
envConf
);


Then, we open each database. We need 3 databases: markers, individuals and genotypes. Opening a Database looks like this:

DatabaseConfig dbConfig= new DatabaseConfig();
/* allow create if database does not exists */
dbConfig.setAllowCreate(!readOnly);
dbConfig.setReadOnly(readOnly);
Database db= this.env.openDatabase(null, "database-name", d
bConfig);


Althoug DBD is based on a pair(key,value) another component of the value could be searched and need to be indexed. This process is called a "Secondary Database". For example, with this project I created a secondary database:1) to find/loop over the markers using their position 2) to find/loop over the individuals using individualID instead of name. We need a class extending SecondaryKeyCreator
to extract this second key for the original value. For example here is the class extraction the Position from the Marker.
class MarkerPositionCreator implements SecondaryKeyCreator
{
public boolean createSecondaryKey(
SecondaryDatabase secDb,
DatabaseEntry keyEntry,
DatabaseEntry dataEntry,
DatabaseEntry resultEntry)
{
Marker marker= (Marker)Test01.this.markerTupleBinding.entryToObject(dataEntry);
Test01.this.positionTupleBinding.objectToEntry(marker.pos, resultEntry);
return true;
}
}

We also need to open those "secondary" databases
SecondaryConfig secondConfig= new SecondaryConfig();
secondConfig.setAllowCreate(!readOnly);
/* on open, if the secondary database is empty then the primary
database is read in its entirety and additions/modifications to the secondary's records occur automatically */
secondConfig.setAllowPopulate(true);
secondConfig.setSortedDuplicates(true);
MarkerPositionCreator posCreator = new MarkerPositionCreator();
/* Identifies the key creator object to be used for secondary key creation. */
secondConfig.setKeyCreator(posCreator);
positionDB = this.env.openSecondaryDatabase(null, "position
", this.markerDB,secondConfig);


OK, more genetic now. The HapMap genotypes are available here:http://www.hapmap.org/genotypes/latest/rs_strand/non-redundant/ (the path may change). A file looks like a table f(marker,individual)=genotype:
rs# SNPalleles chrom pos strand genome_build center protLSID assayLSID panelLSID QC_code NA18940 NA18942 NA189
rs28412942 A/T chrMT 410 + ncbi_B36 affymetrix GenomeWideSNP_6.02 Japanese:2 QC+ AA AA AA AA AA AA AA AA AA AA
rs3937039 A/G chrMT 665 + ncbi_b36 broad genotype_protocol_11 Japanese:1 QC+ AA AA AA AA AA AA AA AA AA AA AA
rs2853517 A/G chrMT 711 + ncbi_b36 broad genotype_protocol_11 Japanese:1 QC+ GG GG GG GG GG GG GG GG GG GG GG
rs28358568 C/T chrMT 712 + ncbi_b36 broad genotype_protocol_11 Japanese:1 QC+ TT TT TT TT TT TT TT TT TT TT TT
(...)


The file is processed as follow.

Pattern space=Pattern.compile("[ ]");
String HEADER[]=new String[]{"rs#","SNPalleles","chrom","pos","strand","genome_build","center","protLSID","assayLSID","panelLSID","QC_code"};
BufferedReader in= new BufferedReader(new InputStreamReader(new GZIPInputStream(url.openStream())));

String line= in.readLine();
if(line==null) throw new IOException("Empty file");
/* The first line of this file is the header*/
String header[]=space.split(line);
for(int i=0;i< HEADER.length;++i)
{
if(!header[i].equals(HEADER[i])) throw new IOException("Bad header "+header[i]+" expected "+HEADER[i]);
}
/* the header contains the name of the Individual which will be inserted. At this time we don't know what are the relationships between those individuals.*/
for(int i=HEADER.length;i< header.length;++i)
{
Individual individual= new Individual();
individual.name=header[i];
DatabaseEntry key= new DatabaseEntry(individual.name.getBytes());
DatabaseEntry data= new DatabaseEntry();
this.individualTupleBinding.objectToEntry(individual, data);
getIndividualDB().put(null
,key
,data
);
}
/** the following lines are the markers and the genotypes */
TupleBinding INT_BINDING=TupleBinding.getPrimitiveBinding(Integer.class);
while((line=in.readLine())!=null)
{
if(!line.startsWith("rs")) continue;
String tokens[]=space.split(line);
//System.err.println(line);
/* fill the information of this marker */
Marker marker= new Marker(Integer.parseInt( tokens[0].substring(2)));
marker.alleles= tokens[1];
marker.pos= new Position(tokens[2],Integer.parseInt(tokens[3]));
marker.strand= tokens[4].charAt(0);

DatabaseEntry key= new DatabaseEntry();
INT_BINDING.objectToEntry(marker.getRSId(), key);
DatabaseEntry data= new DatabaseEntry();
this.markerTupleBinding.objectToEntry(marker, data);
getMarkerDB().put(null
,key
,data
);
/** loop over this marker and get the genotypes */
for(int i=HEADER.length;i<header.length;++i)
{
if(tokens[i].length()!=2 || tokens[i].equals("NN")) continue;
/** create the KEY */
MarkerIndividual mi= new MarkerIndividual(marker.rsId,header[i]);
this.markerIndividualTupleBinding.objectToEntry(mi, key);
/** create the genotype */
this.genotypeTupleBinding.objectToEntry(
new Genotype(tokens[i].charAt(0),tokens[i].charAt(1)),
data
);
/** put the new pair( pair(marker,individual), genotype) in the BDB */
getGenotypeDB().put(null
,key
,data
);
}
}
in.close();


OK, I want to find the relationships between those individuals, this information is available here. For each "Coriell Repository Number" we find the individual in our database, if it exists we add the information and write the individual back to the database. (See function ">makePedigree line 466).

To retrieve the genotype g=f(marker,individual) I wrote the following simple utility function getGenotypeAt:

Genotype getGenotypeAt(Marker marker,Individual indi) throws DatabaseException
{
if(marker==null || indi==null) return null;
DatabaseEntry key=new DatabaseEntry();
DatabaseEntry value=new DatabaseEntry();
this.markerIndividualTupleBinding.objectToEntry(new MarkerIndividual(marker.rsId,indi.name),key);
if(this.genotypeDB.get(null, key, value, LockMode.DEFAULT)!=OperationStatus.SUCCESS) return null;
Genotype g= Genotype.class.cast(this.genotypeTupleBinding.entryToObject(value));
return g;
}



To get the frequencies of the alleles, we loop each over each marker (using a secondary database to get the markers ordered on the genome (not ordered on rs##)) and we get all the genotypes for each individual. To loop over a BDB an instance Cursor (looks like a java.util.Iterator) is used.
void frequencies() throws DatabaseException
{
SecondaryCursor cM= this.positionDB.openSecondaryCursor(null, null);
DatabaseEntry key=new DatabaseEntry();
DatabaseEntry value=new DatabaseEntry();
while(cM.getNext(key, value,LockMode.DEFAULT)==OperationStatus.SUCCESS)
{
Marker m= Marker.class.cast(this.markerTupleBinding.entryToObject(value));
HashMap<Character, Integer> allele2count= new HashMap<Character, Integer>();
int totalGenotyped=0;
int totalFailures=0;
System.out.print("rs"+m.rsId+" "+m.alleles+" "+m.pos.chromosome+" "+m.pos.position+" "+m.strand);
Cursor cI= this.individualDB.openCursor(null, null);
while(cI.getNext(key, value,LockMode.DEFAULT)==OperationStatus.SUCCESS)
{
Individual indi=Individual.class.cast(this.individualTupleBinding.entryToObject(value));
Genotype g= getGenotypeAt(m, indi);
if(g!=null)
{
totalGenotyped++;
for(int i=0;i< 2;++i)
{
char c= (i==0?g.a1:g.a2);
Integer count= allele2count.get(c);
if(count==null) count=0;
allele2count.put(c,count+1);
}
}
else
{
totalFailures++;
}
}
cI.close();
System.out.print(" genotyped:"+(int)(100.0*(totalGenotyped-totalFailures)/(float)totalGenotyped)+"%");
for(Character allele: allele2count.keySet())
{
System.out.print(" f("+allele+")="+allele2count.get(allele)/(2.0*totalGenotyped));
}
System.out.println();
}
cM.close();
}



Finding the Mendelian incompatibilities is much like the same: I sued here the brute force, we loop over each individual and over each marker. If the individual as any parent, we check that his genotype is compatible with them.
void incompats() throws DatabaseException
{
DatabaseEntry key=new DatabaseEntry();
DatabaseEntry value=new DatabaseEntry();

Cursor cI= this.individualDB.openCursor(null, null);
while(cI.getNext(key, value,LockMode.DEFAULT)==OperationStatus.SUCCESS)
{
Individual indi=Individual.class.cast(this.individualTupleBinding.entryToObject(value));

if(indi.fatherID==null && indi.motherID==null)
{
continue;
}

Individual father= findIndividualByCorrielId(indi.fatherID);
Individual mother= findIndividualByCorrielId(indi.motherID);

Cursor cM= this.markerDB.openCursor(null, null);
while(cM.getNext(key, value,LockMode.DEFAULT)==OperationStatus.SUCCESS)
{
Marker m= Marker.class.cast(this.markerTupleBinding.entryToObject(value));
Genotype gChild= getGenotypeAt(m, indi);
Genotype gFather= getGenotypeAt(m, father);
if(isIncompat(gChild,gFather))
{
System.out.println("Incompat: for rs"+m.getRSId()+"("+m.alleles+") "+
indi.individualID+" is "+gChild+" and his father "+indi.fatherID+" is "+
gFather
);
continue;
}
Genotype gMother= getGenotypeAt(m, mother);
if(isIncompat(gChild,gMother))
{
System.out.println("Incompat: for rs"+m.getRSId()+"("+m.alleles+") "+
indi.individualID+" is "+gChild+" and his mother "+indi.motherID+" is "+
gMother
);
continue;
}
if(isIncompat(gChild,gFather,gMother))
{
System.out.println("Incompat: for rs"+m.getRSId()+"("+m.alleles+") "+
indi.individualID+" is "+gChild+
" and his father "+indi.fatherID+" is "+ gFather+" and his mother "+indi.motherID+" is "+ gMother
);
}
}
cM.close();
}
cI.close();
}


That's it. I first test the chromosome 1 at http://www.hapmap.org/genotypes/latest/rs_strand/non-redundant/genotypes_chr1_CEU_r23a_nr.b36.txt.gz(11Mo) but I pressed Ctrl-C when the files reached 1.4Go !
I then used the chr22 file directly downloaded on my computer. The space required by BerkeleyDB to hold the database and the indexes was 721Mo whereas the zipped original source of data was 2Mo (25Mo unzipped)!!! (Arghhhhhhhhhhhh !!!!!).

  • Individuals count:=90

  • Markers count:=54786

  • Genotypes count:=4853237


Time required to load the hapmap file : 1174secs (20min)

rs9624968 A/G chr22 24783030 + genotyped:86% f(G)=0.879746835443038 f(A)=0.12025316455696203
rs9624969 C/T chr22 24784595 + genotyped:87% f(T)=0.075 f(C)=0.925
rs6004919 C/T chr22 24785216 + genotyped:100% f(T)=0.12777777777777777 f(C)=0.8722222222222222
rs4585127 A/G chr22 24785559 + genotyped:100% f(G)=0.8722222222222222 f(A)=0.12777777777777777
rs5752262 A/G chr22 24786367 + genotyped:95% f(G)=0.5116279069767442 f(A)=0.4883720930232558
rs16981296 C/G chr22 24787784 + genotyped:95% f(G)=0.8488372093023255 f(C)=0.1511627906976744
rs1003547 G/T chr22 24788134 + genotyped:86% f(T)=0.44936708860759494 f(G)=0.5506329113924051
rs9613094 A/G chr22 24788388 + genotyped:100% f(G)=0.2222222222222222 f(A)=0.7777777777777778
rs16986627 A/G chr22 24789298 + genotyped:88% f(G)=0.2654320987654321 f(A)=0.7345679012345679
(...)


Time required to generate the frequencies Time:955secs

Incompat: for rs133457(C/T) 1341M02 is CC and his father 1341MF13 is TT
Incompat: for rs136009(C/T) 1341M02 is CT and his father 1341MF13 is TT and his mother 1341MM14 is TT
Incompat: for rs394518(C/T) 1341M02 is CT and his father 1341MF13 is TT and his mother 1341MM14 is TT
Incompat: for rs628437(A/C) 1341M02 is AC and his father 1341MF13 is CC and his mother 1341MM14 is CC
Incompat: for rs731403(C/G) 1341M02 is GG and his mother 1341MM14 is CC
Incompat: for rs1122940(C/T) 1341M02 is CT and his father 1341MF13 is TT and his mother 1341MM14 is TT
Incompat: for rs2845343(A/T) 1341M02 is AA and his mother 1341MM14 is TT
Incompat: for rs4822498(C/T) 1341M02 is CC and his father 1341MF13 is TT
Incompat: for rs4822499(C/T) 1341M02 is TT and his father 1341MF13 is CC
Incompat: for rs4823195(A/G) 1341M02 is AA and his mother 1341MM14 is GG
Incompat: for rs4991267(C/T) 1341M02 is CC and his mother 1341MM14 is TT
Incompat: for rs5755047(A/T) 1341M02 is TT and his father 1341MF13 is AA
Incompat: for rs5755343(A/T) 1341M02 is AA and his mother 1341MM14 is TT
Incompat: for rs5755420(A/C) 1341M02 is AC and his father 1341MF13 is CC and his mother 1341MM14 is CC
Incompat: for rs5765056(C/T) 1341M02 is CT and his father 1341MF13 is CC and his mother 1341MM14 is CC
Incompat: for rs5765436(A/C) 1341M02 is AC and his father 1341MF13 is CC and his mother 1341MM14 is CC
Incompat: for rs5765499(C/T) 1341M02 is CC and his father 1341MF13 is TT
Incompat: for rs5768636(G/T) 1341M02 is GT and his father 1341MF13 is GG and his mother 1341MM14 is GG
Incompat: for rs5769710(C/T) 1341M02 is CC and his father 1341MF13 is TT
Incompat: for rs5770600(A/C) 1341M02 is CC and his father 1341MF13 is AA
Incompat: for rs5997220(A/G) 1341M02 is AG and his father 1341MF13 is GG and his mother 1341MM14 is GG
(...)
764 errors


Time required to find the Mendelian incompatibilities: 11021secs (~3H00)

When the program closed, the database was compacted down to 710Mo.

Conclusion: Too slow, some huges files generated. Definitely a bad choice to handle this kind of data.

03 April 2008

Study Collaborators Included in PubMed

Via NLM Technical Bulletin:

As of November 2007, there were over 57,000 occurrences of group (corporate) authors in MEDLINE/PubMed with over 17,000 citations with no co-occurring personal authors. Not everyone involved in a group is actually writing or authoring the paper, however. NLM agrees (...) that "Authorship credit should be based on

  1. substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data
  2. drafting the article or revising it critically for important intellectual content
  3. final approval of the version to be published
Taking these and other factors into consideration, NLM decided that it is time to include the individual names associated with the group authors in MEDLINE/PubMed. Therefore, when a group name is included as an author, the respective group member names appearing in the article will be acknowledged as collaborators but not associated with authorship. This significant enhancement allows PubMed users to identify articles to which an individual has contributed, whether as an author or as a collaborator. NLM has implemented this new feature routinely for MEDLINE citations created beginning in March 2008 if the article was published in 2008 forward.


See also:

17 March 2008

xul4wikipedia

I've added a few more individuals in my History Of Science and I've also tried to generate an iCal version of this dataset to display the birth/death dates of all those persons (http://lindenb.integragen.org/xulhistory/history.ical) however there is a bug in this file as the events are not correctly displayed in google-calendar. Does anyone knows why ?

There is now a new beautiful version of freebase but it is a little bit slower and as I want to edit a large number of individual, the procedure takes now too much time for me. I received a kind mail of Robert Cook from metaweb about this problem telling me that they're working on this issue. I also noticed that, just like wikipedia, they are now much more concerned about the origin of the pictures. That's fair but I wish I could add a picture to all those individuals :-) . I could draw them but there would be still a problem of rights :-)

Meanwhile, I've added some infoboxes in wikipedia. I also created a simple web form at http://lindenb.integragen.org/xul4wikipedia/xul4wikipedia.cgi to create on the fly a firefox extension. This add-on will append some custom items in the contextual popup menu when editing an article in wikipedia. Each of those items is used to insert a custom text in the textarea of the edited article, for example you won't have to find, copy and paste your favorite Template:Infobox Person, this template will now be always available in your menu. The source code is available here and is broadly inspired from one of my previous post.

Pierre

16 March 2008

pubmed xml references to wikipedia ref

I wrote a simple xsl stylesheet which transform the XML output from pubmed to a wikipedia <ref>.
The xsl fle is available here:

http://lindenb.googlecode.com/svn/trunk/src/xsl/pubmed2wiki.xsl

I used it with an article about Sir John Robert Vane who discovered the action of aspirin on prostaglandin biosynthesis.

10 March 2008

Custom Search Engine For Bioinformatics

Mostly because I need this every time, I wrote a few "Custom Search" engines for dbSNP, Hapmap and the ICSC Genome Browser. Those engines are available at http://lindenb.integragen.org/opensearch/opensearch.html.

Pierre

19 February 2008

Freebase Wikipedia Extraction (WEX)

Via the Freebase blog.

The Freebase Wikipedia Extraction (WEX) http://download.freebase.com/wex/ is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted in tabular form.

Freebase WEX is provided as a set of database tables in TSV format for PostgreSQL, along with tables providing mappings between Wikipedia articles and Freebase topics, and corresponding Freebase Types


See also:



Pierre

14 February 2008

Freebase and the History of Sciences

(feed readers, this post is better displayed on the web site)
I've been looking for a way to get a structured description of the biographies of the scientists threw the History. One of my investigation led to wikistory, a webstart application based on the data extracted from Wikipedia by the project DBPedia.

History of Sciences / Freebase


However, the data collected from DBPedia are mostly based on the infoboxes and most of them are missing or are incomplete. (as an example this is ok for Darwin (happy birthday) but there is no box for Georges-Louis Leclerc, Comte de Buffon (last accessed February 14th 2008 20H46)). Moreover the informations stored in those infoboxes are missing the fields I needed: gender, a short biography, parents, children.... etc....

I eventually decided to go back to look after freebase wich I tested a few monthes ago and which was also introduced at scifoo:

image from  dchud


A screenshot of my final result is presented below:
and you can test this interface here:

This is an interactive XUL page (it will only work with firefox) with a timeline containing a few hundred of scientists.


Just for fun, I also generated a time-based KML file for google-earth with those data.





The source code used to create this XUL page is available at here.

Here is how I proceeded:
On freebase I created my own type "scientist" enclosing some fields such as "short bio", "known for", "students", etc... (I don't know how to define 'inverse properties' in freebase: if A was the teacher of B, how can I automatically say that B was the student of A ?). This type scientist was added to some of the freebase records and completed (I think that Freebase also parsed the infoboxes in wikipedia to build their database, that is why most their records are almost empty). For example see: Buffon.


All the persons associated with my type scientist can be retrieved using the following MQL query:


{"qname1":{"query":[{"guid":null,"type":"/user/lindenb/default_domain/scientist"}]}}



The result looks like this...

{
"status": "200 OK",
"code": "/api/status/ok",
"qname1": {
"code": "/api/status/ok",
"result": [
{
"guid": "#9202a8c04000641f800000000000cb7c",
"type": "/user/lindenb/default_domain/scientist"
},
{
"guid": "#9202a8c04000641f800000000000f65e",
"type": "/user/lindenb/default_domain/scientist"
},

(...)
{
"guid": "#9202a8c04000641f80000000003b7d80",
"type": "/user/lindenb/default_domain/scientist"
},
{
"guid": "#9202a8c04000641f80000000003bd1ef",
"type": "/user/lindenb/default_domain/scientist"
}
]
}



For each gui we can retrieve the types associated with the record.


{"qname1":{"query":{"guid":"#9202a8c04000641f800000000000cb7c","type":[]}}}


The types associated with the record #9202a8c04000641f800000000000cb7c" were returned as follow:

{
"status": "200 OK",
"code": "/api/status/ok",
"qname1": {
"code": "/api/status/ok",
"result": {
"guid": "#9202a8c04000641f800000000000cb7c",
"type": [
"/common/topic",
"/people/person",
"/people/deceased_person",
"/book/author",
"/user/mikelove/default_domain/influence_node",
"/user/lindenb/default_domain/scientist",
"/award/award_winner"
]
}
}
}


For each type, I fetched the fields this record.

{"qname1":{"query":{"guid":"#9202a8c04000641f800000000000cb7c","*":null,"type":"/people/person"}}}


here is the response from freebase:

{
"status": "200 OK",
"code": "/api/status/ok",
"qname1": {
"code": "/api/status/ok",
"result": {
"creator": "/user/metaweb",
"profession": [
"Naturalist",
"Biologist",
"Geologist"
],
"places_lived": [
null,
null,
null,
null
],
"education": [
null,
null
],
"children": [
"George Howard Darwin",
"Horace Darwin"
],
"guid": "#9202a8c04000641f800000000000cb7c",
"employment_history": [],
"id": "/topic/en/charles_darwin",
"religion": [
"Agnosticism",
"Christianity",
"Unitarianism",
"Church of England"
],
"date_of_birth": "1809-02-12",
"parents": [
"Robert Darwin",
"Susannah Darwin"
],
"metaweb_user_s": [],
"type": "/people/person",
"attribution": "/user/metaweb",
"permission": "/boot/all_permission",
"timestamp": "2006-10-22T08:53:38.0061Z",
"signature": [],
"weight_kg": null,
"key": [
"Charles_Darwin",
"Charles_Robert_Darwin",
"Darwin$0027s",
"Mary_Darwin",
"Darwin$002C_Charles",
"C$002E_R$002E_Darwin",
"Charles_R$002E_Darwin",
"8145410",
"Charles_Darwin$0027s",
"Charles_darwin",
"charles_darwin",
"CR_Darwin",
"Charles_R_Darwin",
"71b891f5-92bb-42be-9c45-98b8f56a3177"
],
"nationality": [
"United Kingdom"
],
"spouse_s": [
null
],
"name": "Charles Darwin",
"gender": "Male",
"sibling_s": [
null
],
"height_meters": null,
"place_of_birth": "Shrewsbury",
"quotations": []
}
}
}


And so on, using this kind of queries I was able to fetch the birth dates, the geographical coordinate of the places, the pictures, etc...

The result is available here:

History of Sciences / Freebase

http://lindenb.integragen.org/xulhistory/history.php


That's it.
Pierre

02 February 2008

Creating a XUL extension for Mozilla/Firefox: my notebook.

(RSS readers, this file is better displayed on my blog)
Here is my notebook on how to create an extension for firefox. The following example was tested with firefox 2.0.0.11. This extension is used to insert a few default templates (such as Template:Infobox_scientist ) when editing a biography on Wikipedia. Infoboxes are used , for example by DBPedia, to create a structured version of wikipedia.

First, create a new profile for firefox, say TEST by invoking firefox with option '-P'

firefox -P

Set up your extension development environment as described here.

I'm now working in the directory ~/XUL:

Create the file ./install.rdf. It's a RDF file describing your extension:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:em="http://www.mozilla.org/2004/em-rdf#">

<rdf:Description about="urn:mozilla:install-manifest">
<!-- my extension ID -->
<em:id>biography-helper@plindenbaum.com</em:id>
<!-- version -->
<em:version>2.0</em:version>
<!-- this is a firefox extension -->
<em:type>2</em:type>

<em:targetApplication>
<rdf:Description>
<!-- this is for firefox -->
<em:id>{ec8030f7-c20a-464f-9b0e-13a3a9e97384}</em:id>
<!-- min/max firefox version -->
<em:minVersion>2.0</em:minVersion>
<em:maxVersion>2.0.0.*</em:maxVersion>
</rdf:Description>
</em:targetApplication>

<!-- name -->
<em:name>Wikipedia Edit Helper!</em:name>
<!-- description -->
<em:description>An Extension for Editing biographies in Wikipedia</em:description>
<!-- author -->
<em:creator>Pierre Lindenbaum</em:creator>
<!-- contact -->
<em:homepageURL>http://plindenbaum.blogspot.com</em:homepageURL>
<!-- icon -->
<em:iconURL>chrome://wiki4biography/skin/darwin32.png</em:iconURL>
</rdf:Description>
</rdf:RDF>


The file ./chrome/content/menu.xul is the XUL interface which will be added to the contextual popup-menu.

<?xml version="1.0" encoding="UTF-8"?>
<overlay id="wiki4biography" xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
<script src="library.js"/>

<popup id="contentAreaContextMenu">
<menuseparator/>
<menu label="Wikipedia" id="menuWikipedia">
<menupopup>

<menuitem label="Infobox Scientist" oncommand="MY.infobox()" />

<menu label="Categories">
<menupopup>
<menuitem label="Astronomers" oncommand="MY.category('Astronomers')"/>
<menuitem label="Biologists" oncommand="MY.category('Biologists')"/>
<menuitem label="Chemists" oncommand="MY.category('Chemists')"/>
<menuitem label="Physicists" oncommand="MY.category('Physicists')"/>
</menupopup>
</menu>

<menu label="Stubs">
<menupopup>
<menuitem label="Astronomer" oncommand="MY.insertTemplate('{{astronomer-stub}}')"/>
<menuitem label="Chemist" oncommand="MY.insertTemplate('{{chemist-stub}}')"/>
<menuitem label="Biologist" oncommand="MY.insertTemplate('{{biologist-stub}}')"/>
<menuitem label="Mathematician" oncommand="MY.insertTemplate('{{mathematician-stub}}')"/>
<menuitem label="Physicist" oncommand="MY.insertTemplate('{{physicist-stub}}')"/>
</menupopup>
</menu>

</menupopup>
</menu>

</popup>
</overlay>



The script used by our menu is ./chrome/content/library.js
var MY={
/** when the xul page is loaded, register for events from the contextual popupmenu */
onload:function()
{
var element = document.getElementById("contentAreaContextMenu");
element.addEventListener("popupshowing",function(evt){MY.preparePopup(evt);},true);
},
/* prepare the contextual menu just before it is showing on screen: hide or show our menu */
preparePopup:function(evt)
{
var element = document.getElementById("menuWikipedia");
if(document.popupNode.id!="wpTextbox1")
{
element.hidden=true;
return;
}
element.hidden=false;
},
/** insert a text at the caret position in the textarea of wikipedia */
insertTemplate:function(text)
{
var area= content.document.getElementById("wpTextbox1");
if(area==null) return;
//alert(area.value.substring(0,20)+" "+area.tagName);
var selstart=area.selectionStart;
var x= area.scrollLeft;
var y= area.scrollTop;
area.value= area.value.substring(0,selstart)+
text+
area.value.substring(area.selectionEnd)
;
area.scrollLeft=x;
area.scrollTop=y;
selstart+=text.length;
area.setSelectionRange(selstart,selstart);
},
/* insert a wikipedia category */
category:function(text)
{
MY.insertTemplate("[[Category:"+text+"]]");
},
/** get current article name */
article:function()
{
var url=""+content.document.location;
var i=url.indexOf("title=",0);
if(i==-1) return "";
i+=6;
var j=url.indexOf("&action",i);
if(j==-1) return "";
return unescape(url.substr(i,j-i).replace("_"," "));
},
/* insert an infobox */
infobox:function()
{
var box="{{Infobox Scientist\n"+
"|name = "+MY.article()+"\n"+
"|box_width =\n"+
"|image = No_free_image_man_%28en%29.svg\n"+ /** sorry, most scientists in wikipedia are men */
"|image_width = 200px\n"+
"|caption = "+MY.article()+"\n"+
"|birth_date = \n"+
"|birth_place = \n"+
"|death_date = \n"+
"|death_place = \n"+
"|residence = \n"+
"|citizenship = \n"+
"|nationality = \n"+
"|ethnicity = \n"+
"|field = \n"+
"|work_institutions = \n"+
"|alma_mater = \n"+
"|doctoral_advisor = \n"+
"|doctoral_students = \n"+
"|known_for = \n"+
"|author_abbrev_bot = \n"+
"|author_abbrev_zoo = \n"+
"|influences = \n"+
"|influenced = \n"+
"|prizes = \n"+
"|footnotes = \n"+
"|signature =\n"+
"}}\n";
MY.insertTemplate(box);
}
};
/* initialize all this stuff */
window.addEventListener("load",MY.onload, false);


The icon ./chrome/skin/darwin32.png is used as an icon for the extension.

The file ./chrome.manifest says what firefox packages and overlays this extension provides.
content wiki4biography chrome/content/
overlay chrome://browser/content/browser.xul chrome://wiki4biography/content/menu.xul
skin wiki4biography classic/1.0 chrome/skin/


To test this extension a file ${HOME}/.mozilla/firefox/testmozilla/extensions/biography-helper@plindenbaum.com is created. This file contains the path to the XUL folder.
/home/pierre/tmp/XUL/

You can test the extension by invoking firefox with the profile "TEST":
firefox -no-remote -P TEST


When your extension is ready you can package it into a *.xpi archive.
zip -r wikipedia.zip chrome chrome.manifest install.rdf
mv wikipedia.zip wikipedia.xpi


That's it. You can download this extension at http://lindenb.integragen.org/xul/wikipedia.xpi and then open it with firefox which will prompt you if you want to install this extension. Then, edit an article in wikipedia and click the left button to get the new contextual menu.


Pierre