Showing posts with label wiki. Show all posts
Showing posts with label wiki. Show all posts

24 April 2012

Mapping the genes involved in a category of disease: the GeneWikiPlus + SPARQL way.

In my previous post, I've used the RDF/XML files of the Disease Ontology to map all the genes involved in a cardiac disease.

Andrew Su immediately mentioned on Twitter that he was working on GeneWiki+, an integration of GeneWiki on Semantic-MediaWiki that could answer the same question.





Later, Benjamin Good announced that a SPARQL endpoint for GeneWiki+ was now available:


The following java code uses the Jena/ARQ API to query this SPARQL endpoint. For a given Disease Ontology accession identifier, it fetches all the genes associated to this disease and run recursively with the sub-classes of this disease.



Here is the output (gene-name, gene-id, disease) with DOID:114 ("Heart Disease"):
Protein C 5624 Heart disease
HMG-CoA reductase 3156 Heart disease
SCARB1 949 Heart disease
Coagulation factor II receptor 2149 Heart disease
Cathepsin S 1520 Heart disease
ABCA1 19 Heart disease
CHD7 55636 Heart disease
GJA5 2702 Heart disease
ENTPD1 953 Heart disease
PEDF 5176 Heart disease
HMG CoA reductase 3156 Heart disease
PROC 5624 Heart disease
F2R 2149 Heart disease
SERPINF1 5176 Heart disease
HMGCR 3156 Heart disease
CTSS 1520 Heart disease
Cytochrome c 54205 Heart failure
FOXP1 27086 Heart failure
Vasoactive intestinal peptide 7432 Heart failure
Angiotensin-converting enzyme 1636 Heart failure
PPP1CA 5499 Heart failure
Transferrin 7018 Heart failure
Natriuretic peptide precursor C 4880 Heart failure
Insulin-like growth factor 1 3479 Heart failure
CA-125 94025 Heart failure
Myosin binding protein C, cardiac 4607 Heart failure
MYH7 4625 Heart failure
Tafazzin 6901 Heart failure
5-HT2B receptor 3357 Heart failure
Beta-1 adrenergic receptor 153 Heart failure
PTGS2 5743 Heart failure
EPAS1 2034 Heart failure
Nociceptin receptor 4987 Heart failure
Cystatin C 1471 Heart failure
Ryanodine receptor 2 6262 Heart failure
Multidrug resistance-associated protein 2 1244 Heart failure
KCNA5 3741 Heart failure
ANXA6 309 Heart failure
CMA1 1215 Heart failure
KLF15 28999 Heart failure
IL1RL1 9173 Heart failure
JPH2 57158 Heart failure
Heart-type fatty acid binding protein 2170 Heart failure
TF 7018 Heart failure
ABCC2 1244 Heart failure
Cytochrome-c 54205 Heart failure
HTR2B 3357 Heart failure
Cytochrome C 54205 Heart failure
Hif2a 2034 Heart failure
FABP3 2170 Heart failure
MYBPC3 4607 Heart failure
Angiotensin converting enzyme 1636 Heart failure
IGF-1 3479 Heart failure
Insulin-like growth factor-1 3479 Heart failure
Stress-induced polymorphic ventricular tachycardia 6262 Heart failure
C-type natriuretic peptide 4880 Heart failure
OPRL1 4987 Heart failure
CYCS 54205 Heart failure
ADRB1 153 Heart failure
TAZ 6901 Heart failure
VIP 7432 Heart failure
IGF1 3479 Heart failure
NPPC 4880 Heart failure
ACE 1636 Heart failure
CST3 1471 Heart failure
MUC16 94025 Heart failure
RYR2 6262 Heart failure
Aquaporin-2 359 Congestive heart failure
Aquaporin 2 359 Congestive heart failure
Atrial natriuretic peptide 4878 Congestive heart failure
Brain natriuretic peptide 4879 Congestive heart failure
Phospholamban 5350 Congestive heart failure
CYP2C9 1559 Congestive heart failure
RAGE (receptor) 177 Congestive heart failure
Angiotensin II receptor type 1 185 Congestive heart failure
Programmed cell death 1 5133 Congestive heart failure
AGTR1 185 Congestive heart failure
Atrial natriuretic factor 4878 Congestive heart failure
PDCD1 5133 Congestive heart failure
AGER 177 Congestive heart failure
AQP2 359 Congestive heart failure
PLN 5350 Congestive heart failure
NPPB 4879 Congestive heart failure
NPPA 4878 Congestive heart failure
GroEL 3329 Endocarditis
Ornithine transcarbamylase 5009 Endocarditis
Valosin-containing protein 7415 Endocarditis
Parathyroid hormone 1 receptor 5745 Endocarditis
VDAC1 7416 Endocarditis
RuvB-like 1 8607 Endocarditis
TUBB2A 7280 Endocarditis
ACTG1 71 Endocarditis
ACTC1 70 Endocarditis
PRDX6 9588 Endocarditis
Hyaluronan-mediated motility receptor 3161 Endocarditis
HSPB6 126393 Endocarditis
Parathyroid hormone receptor 1 5745 Endocarditis
VCP 7415 Endocarditis
OTC 5009 Endocarditis
PTH1R 5745 Endocarditis
HSPD1 3329 Endocarditis
HMMR 3161 Endocarditis
RUVBL1 8607 Endocarditis
HCN4 10021 Sick sinus syndrome
Heparin-binding EGF-like growth factor 1839 Aortic valve disease
HBEGF 1839 Aortic valve disease
Von Willebrand factor 7450 Aortic valve stenosis
ADAMTS13 11093 Aortic valve stenosis
VWF 7450 Aortic valve stenosis
Elastin 2006 Supravalvular aortic stenosis
ELN 2006 Supravalvular aortic stenosis
PRG4 10216 Pericarditis
Histamine H3 receptor 11255 Myocardial ischemia
MAP3K7IP1 10454 Myocardial ischemia
Vascular endothelial growth factor A 7422 Myocardial ischemia
Cathepsin L1 1514 Myocardial ischemia
VEGF-A 7422 Myocardial ischemia
VEGFA 7422 Myocardial ischemia
CTSL1 1514 Myocardial ischemia
TAB1 10454 Myocardial ischemia
HRH3 11255 Myocardial ischemia
APOA1 335 Coronary heart disease
APOC3 345 Coronary heart disease
Lipoprotein(a) 4018 Coronary heart disease
Brain natriuretic peptide 4879 Coronary heart disease
Beta-3 adrenergic receptor 155 Coronary heart disease
Insulin-like growth factor 1 3479 Coronary heart disease
Perlecan 3339 Coronary heart disease
PCSK9 255738 Coronary heart disease
Cholesterylester transfer protein 1071 Coronary heart disease
Arachidonate 5-lipoxygenase 240 Coronary heart disease
Apolipoprotein B 338 Coronary heart disease
Apolipoprotein A1 335 Coronary heart disease
Beta-1 adrenergic receptor 153 Coronary heart disease
Apolipoprotein C3 345 Coronary heart disease
Lipoprotein-associated phospholipase A2 7941 Coronary heart disease
NEUROG3 50674 Coronary heart disease
5-lipoxygenase 240 Coronary heart disease
ApoA1 335 Coronary heart disease
CETP 1071 Coronary heart disease
ApoB 338 Coronary heart disease
IGF-1 3479 Coronary heart disease
Insulin-like growth factor-1 3479 Coronary heart disease
ApoCIII 345 Coronary heart disease
PLA2G7 7941 Coronary heart disease
ADRB3 155 Coronary heart disease
ADRB1 153 Coronary heart disease
APOB 338 Coronary heart disease
ALOX5 240 Coronary heart disease
IGF1 3479 Coronary heart disease
NPPB 4879 Coronary heart disease
HSPG2 3339 Coronary heart disease
LPA 4018 Coronary heart disease
CYP7A1 1581 Myocardial infarction
Caspase 3 836 Myocardial infarction
C-reactive protein 1401 Myocardial infarction
Renin 5972 Myocardial infarction
Factor VII 2155 Myocardial infarction
Factor H 3075 Myocardial infarction
Hepatic lipase 3990 Myocardial infarction
Myeloperoxidase 4353 Myocardial infarction
Endothelial protein C receptor 10544 Myocardial infarction
ALDH2 217 Myocardial infarction
C1-inhibitor 710 Myocardial infarction
Basic fibroblast growth factor 2247 Myocardial infarction
Myocyte-specific enhancer factor 2A 4205 Myocardial infarction
5-Lipoxygenase-activating protein 241 Myocardial infarction
RAGE (receptor) 177 Myocardial infarction
OLR1 4973 Myocardial infarction
Beta-1 adrenergic receptor 153 Myocardial infarction
PTGS2 5743 Myocardial infarction
Cholesterol 7 alpha-hydroxylase 1581 Myocardial infarction
GPVI 51206 Myocardial infarction
Adrenomedullin 133 Myocardial infarction
Prostacyclin synthase 5740 Myocardial infarction
Cystatin C 1471 Myocardial infarction
Tenascin X 7148 Myocardial infarction
Thymosin beta-4 7114 Myocardial infarction
GCLM 2730 Myocardial infarction
S100A9 6280 Myocardial infarction
IL1RL1 9173 Myocardial infarction
LGALS2 3957 Myocardial infarction
CKM (gene) 1158 Myocardial infarction
ABCC9 10060 Myocardial infarction
Renalase 55328 Myocardial infarction
VTI1A 143187 Myocardial infarction
MIAT (gene) 440823 Myocardial infarction
BFGF 2247 Myocardial infarction
TMSB4X 7114 Myocardial infarction
CASP3 836 Myocardial infarction
Caspase-3 836 Myocardial infarction
Complement factor H 3075 Myocardial infarction
MEF2A 4205 Myocardial infarction
5-lipoxygenase activating protein 241 Myocardial infarction
Factor VIIa 2155 Myocardial infarction
PROCR 10544 Myocardial infarction
GP6 51206 Myocardial infarction
F7 2155 Myocardial infarction
AGER 177 Myocardial infarction
ADRB1 153 Myocardial infarction
MIAT 440823 Myocardial infarction
CFH 3075 Myocardial infarction
CKM 1158 Myocardial infarction
CRP 1401 Myocardial infarction
LIPC 3990 Myocardial infarction
RNLS 55328 Myocardial infarction
PTGIS 5740 Myocardial infarction
TNXB 7148 Myocardial infarction
SERPING1 710 Myocardial infarction
FGF2 2247 Myocardial infarction
REN 5972 Myocardial infarction
ADM 133 Myocardial infarction
CST3 1471 Myocardial infarction
MPO 4353 Myocardial infarction
ALOX5AP 241 Myocardial infarction
Myoglobin 4151 Acute myocardial infarction
Tissue plasminogen activator 5327 Acute myocardial infarction
MIRN21 406991 Acute myocardial infarction
Apolipoprotein B 338 Acute myocardial infarction
Endothelin 1 1906 Acute myocardial infarction
MMP3 4314 Acute myocardial infarction
Heart-type fatty acid binding protein 2170 Acute myocardial infarction
Alteplase 5327 Acute myocardial infarction
FABP3 2170 Acute myocardial infarction
ApoB 338 Acute myocardial infarction
MB 4151 Acute myocardial infarction
APOB 338 Acute myocardial infarction
PLAT 5327 Acute myocardial infarction
EDN1 1906 Acute myocardial infarction
MIR21 406991 Acute myocardial infarction
Adenosine A1 receptor 134 Myocardial stunning
SOD2 6648 Myocardial stunning
ADORA1 134 Myocardial stunning
MYH7 4625 Endocardial fibroelastosis
Tafazzin 6901 Endocardial fibroelastosis
TAZ 6901 Endocardial fibroelastosis
Nav1.5 6331 Conduction disease
SCN5A 6331 Conduction disease
PRKAG2 51422 Wolff-Parkinson-White syndrome
TNNT2 7139 Restrictive cardiomyopathy
Titin 7273 Hypertrophic cardiomyopathy
CSRP3 8048 Hypertrophic cardiomyopathy
CD36 948 Hypertrophic cardiomyopathy
Myosin binding protein C, cardiac 4607 Hypertrophic cardiomyopathy
MYH7 4625 Hypertrophic cardiomyopathy
MYL9 10398 Hypertrophic cardiomyopathy
TNNT2 7139 Hypertrophic cardiomyopathy
ACTC1 70 Hypertrophic cardiomyopathy
Endothelin 2 1907 Hypertrophic cardiomyopathy
MYL2 4633 Hypertrophic cardiomyopathy
MYH6 4624 Hypertrophic cardiomyopathy
MYBPC1 4604 Hypertrophic cardiomyopathy
MYL3 4634 Hypertrophic cardiomyopathy
JPH2 57158 Hypertrophic cardiomyopathy
MYLK2 85366 Hypertrophic cardiomyopathy
MYBPC3 4607 Hypertrophic cardiomyopathy
CD-36 948 Hypertrophic cardiomyopathy
TTN 7273 Hypertrophic cardiomyopathy
EDN2 1907 Hypertrophic cardiomyopathy
Titin 7273 Dilated cardiomyopathy
CSRP3 8048 Dilated cardiomyopathy
Phospholamban 5350 Dilated cardiomyopathy
Tafazzin 6901 Dilated cardiomyopathy
Beta-1 adrenergic receptor 153 Dilated cardiomyopathy
LMNA 4000 Dilated cardiomyopathy
Palladin 23022 Dilated cardiomyopathy
Fukutin 2218 Dilated cardiomyopathy
TNNT2 7139 Dilated cardiomyopathy
ACTC1 70 Dilated cardiomyopathy
SGCD 6444 Dilated cardiomyopathy
Programmed cell death 1 5133 Dilated cardiomyopathy
LDB3 11155 Dilated cardiomyopathy
ABCC9 10060 Dilated cardiomyopathy
PDCD1 5133 Dilated cardiomyopathy
ADRB1 153 Dilated cardiomyopathy
TTN 7273 Dilated cardiomyopathy
TAZ 6901 Dilated cardiomyopathy
PLN 5350 Dilated cardiomyopathy
PALLD 23022 Dilated cardiomyopathy
FKTN 2218 Dilated cardiomyopathy

Note: In my previous post ADA was found to be associated to DOID:3363 (coronary arteriosclerosis). This result was not retrieved using SPARQL and this information is not available on the GeneWiki+ page for ADA. But keep in mind that GeneWiki+ is still under development.

That's it,

Pierre


05 January 2011

Template:Infobox biodatabase

I've just started creating a wikipedia infobox to annotate the biological databases in wikipedia. If many articles use this template, then it will be possible to parse the them and to create a list of the databases providing some web services, some SPARQL endpoints, having a download area etc...
The infobox itself is still a draft, so feel free to modify it or to suggest some other fields in the 'Talk' page.



that's it,

Pierre

06 August 2010

A MediaWiki extension displaying the UCSC Genome Browser

Today I wrote an extension for mediawiki displaying an HTML <iframe/> to the UCSC Genome Browser. This extension will help my colleagues to annotate some candidate genes threw our local wiki.

This extension handles a new tag <ucsciframe> composed of three required parameters: 'chrom', 'start' and 'end'.

For example
<ucsciframe chrom="chr2" start="98987" end="9879899"/>
The source code for this extension is available at:and its documentation is available on www.mediawiki.org.

That's it !

Pierre

20 September 2009

From FriendFeed to Nucleic Acids Research.

Deepak Singh and Andrew Su have both already posted on their blog about it: I'm proud to be the second author of a paper published in the "Database Issue" of Nucleic Acids Research.

The Gene Wiki: community intelligence applied to human gene annotation

Jon W. Huss III, Pierre Lindenbaum, Michael Martone, Donabel Roberts, Angel Pizarro, Faramarz Valafar, John B. Hogenesch and Andrew I. Su

Nucleic Acids Research, doi:10.1093/nar/gkp760

What I really like about this paper is how the collaboration started: last year Andrew asked for some help on FriendFeed, the Life Scientists:



.. I sent a mail and said I could possibly help , "et voila" !
Citing Andrew: I'd also be remiss if I didn't also note the critical role online collaboration played in this effort. Of the seven coauthors on this paper, two I've met only once in real life, and two I've never met in person. We are spread over four cities, five organizations, and nine time zones. Initiating and executing this collaboration happened virtually entirely online, aided by the FriendFeed Life Scientists room and Molecular and Cellular Biology WikiProject at Wikipedia. It was an eye-opener in terms of how effective online collaboration can be done.

Andrew, thank you again :-)


Pierre

11 June 2009

A RDF Editor for Media wiki (draft)

(This page was copied from the article I started on mediawiki.org)
I've created an applet that can be used as a RDF editor for mediawiki, the wiki engine for Wikipedia. (This is mainly a proof-of-concept, I don't know if I'm going to use this system myself) The XML/RDF syntax of the document is checked and it is validated against a ~RDFS schema. This method was inspired from the one described in the article "Add Java extensions to your wiki".

On opening, the java applet is opened, and the user write a XML/RDF document in an input area.

The syntax of XML/RDF is checked. The document is also validated vs a schema localized in ${MW}/mwrdf/schema.rdf. If there is an error, a message is displayed and the 'Save Button' is disabled.

Once the document is saved, it is displayed as a <PRE> section.
Categories are bound to "mwrdf/shema.rdf and are automatically added.


and the RDF document can then be retrieved with the Mediawiki API.


Installation

  • Install the java JRE (version > 1.6)
  • The MediaWiki API must be enabled for action=query
  • append the following code at the end of ${MW}/LocalSettings.php
    require_once("mwrdf/RDFEdit.php");
  • download mwrdf.zip' from http://code.google.com/p/lindenb/downloads/list
  • unzip the file mwrdf.zip in ${MW}
  • edit the schema mwrdf/schema.rdf (TODO, describe the Schema for ... the schema... :-) )


That's it.
Pierre

04 January 2009

An extension for MediaWiki: displaying a DNA sequence

This post is about a new extension for MediaWiki (the wiki engine of wikipedia written in PHP). This was the first extension I wrote: this extension add a new custom tag <dnaseq> and it simply displays a DNA sequence. Here is a screenshot of this extension installed in my local mediawiki.


and the source code for this extension is available here:


First we tell MediaWiki about this new extension in ${MWROOT}/LocalSettings.php
require_once("$IP/extensions/dnaseq/dnaseq.php");

Then we install this new feature which is a new TAG named <dnaseq>. Each time the mediawiki will find a <dnaseq> , the function myRenderDnaSequence will be called.
$wgHooks['ParserFirstCallInit'][] = 'myDnaSequence';
(...)
function myDnaSequence()
{
global $wgParser;
$wgParser->setHook( 'dnaseq', 'myRenderDnaSequence' );
return true;
}

myRenderDnaSequence is the function returning the formatted DNA sequence:
function myRenderDnaSequence( $input, $args, $parser )
{
if($input==null) return "";
$len= strlen($input);
$n=0;
$html="<div style='padding: 10px; font-size:10px; border-width: thin; border: 1px black solid; white-space: pre;background-color: white;font-family: courier, monospace;line-height:13px; font-size:12px;'>";
for($i=0;$i< $len;$i++)
{
$c = $input[$i];
if(ctype_space($c) || ctype_digit($c)) continue;
if($n % 60 == 0)
{
if($n!=0) $html.="<br/>";
$html.= sprintf("%06d ",($n+1));
}
else if($n % 10 ==0)
{
$html.=" ";
}
$n++;
switch(strtolower($c))
{
case "a":
$html.="<span style='color:green;'>".$c."</span>";
break;
case "c":
$html.="<span style='color:blue;'>".$c."</span>";
break;
case "g":
$html.="<span style='color:black;'>".$c."</span>";
break;
case "t":
case "u":
$html.="<span style='color:red'>".$c."</span>";
break;
default:
$html.="<span style='text-decoration:blink;color:gray'>".$c."</span>";
break;
}
if($n % 60 == 0)
{
$html.= sprintf(" %06d",($n));
}
}
$html .= "</div>";
return $html;
}


That's it.
Pierre

11 December 2008

Random notes 2008-12:

Genetic Algorithm


Evolution of Charles Darwin. I've implemented my own version of the Genetic Algorithm described by Roger Alsing in his blog ( http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa ). This algorithm finds the best set of colored triangles that could be used to re-create an original image.



On the left : the original image (via wikipdia), on the right the current image generated by the genetic algorithm at generation 240 (population:20 individuals of 50 triangles). My algorithm is currently running .
The source is available here: http://tinyurl.com/57xaeb
A short doc is available here: http://code.google.com/p/lindenb/wiki/GAMonaLisa
I've also uploaded an executable jar here: http://code.google.com/p/lindenb/downloads/list

Workbench


I've uploaded a beta version of a spreadsheet-like program that I wrote for the people of my lab.
It was designed to help people with handling large tables in a rich graphical environment. It currently performs a few tasks that are common under unix. For example, it can finds the information about a column of SNP and I've implemented a grep/awk function filtering the rows with a simple javascript expression.The data are stored with the help of the Java berkeleyDB API to create an index of each row in a table.


This screenshot is a java JTable displaying the hapmap genotypes for chr1/build36/CEU. The size of the original file is 146Mo

The tool is available as a java webstart application. See http://code.google.com/p/cephlib/wiki/Workbench.

Wiki


I've done a presentation on how to use a wiki in a lab. Used both OWW and wikipedia. I showed them how to edit/follow/track a page ( http://tinyurl.com/6ejw35), how to create/discuss a page with templates and categories ( http://tinyurl.com/5l5bw5 ), how files can be uploaded in a wiki and commented ( http://tinyurl.com/5ouc7y ). A demo of the wikipedia API ( http://tinyurl.com/2dp5r4 ).
People were then interested in storing+annotating (linkage) files in a wiki.

FiendFeed


Thank you to all the crowd in FriendFeed. Really motivating.


Pierre

25 May 2008

Leonard Colebrook: Creating a Biography in Wikipedia

Today I created a new article in wikipedia about Leonard Colebrook who was an " English medical researcher who introduced the use of Prontosil, the first sulfonamide drug, as a cure for puerperal, or childbed, fever, a condition resulting from infection after childbirth or abortion (Encyclopaedia Britannica)" (Let's be clear, I didn't know who was that guy till today). Here is how I wrote this article.
First of all, I logged into wikipedia using my login/password. In the article about the Prontosil, I clicked on the link "edit this page", added a reference about Colebrook.

... [[Leonard Colebrook]] introduced it as a cure for puerperal fever ...

and saved the page.

Clicking on this new link makes wikipedia open an editor for a new article. The wiki-code below is what I wrote (please not that, a few weeks ago I created to tool called XUL4Wikipedia, I use it as a source of shortcuts to edit such articles).



1 {{Infobox Scientist
2 |name = {{PAGENAME}}
3 |box_width =
4 |image =Replace_this_image_male.svg
5 |image_size =150px
6 |caption = {{PAGENAME}}
7 |birth_date = {{Birth date|1883|3|2}}
8 |birth_place = [[Guildford, Surrey]]
9 |death_date = {{Death date and age|1883|3|2|1967|9|27}}
10 |death_place = [[Farnham Common]], [[Buckinghamshire]]
11 |residence =
12 |citizenship =
13 |nationality = [[England]]
14 |ethnicity =
15 |field = [[medicine]]
16 |work_institutions =
17 |alma_mater = [[St Mary's Hospital, London]]
18 |doctoral_advisor =
19 |doctoral_students =
20 |known_for = [[Prontosil]]
21 |author_abbrev_bot =
22 |author_abbrev_zoo =
23 |influences = [[Almroth Wright]]
24 |influenced = [[Peter Medawar]]
25 |prizes = [[Blair Bell medal]] in [[1955]]
26 |religion =
27 |footnotes =
28 |signature =
29 }}
30 '''{{PAGENAME}}''' [[Fellow_of_the_Royal_Society|FRS]] ( {{Birth
31 date|1883|3|2}} – {{Death date|1967|9|27}}) was an
32 [[England|English]] [[physician]] who introduced the use of [[Prontosil]]
33 in [[1935]] as a cure for [[puerperal fever]].
34
35 ==References==
36 *{{cite journal
37 | quotes = yes
38 |last=Dunn
39 |first=P M
40 |authorlink=
41 |year=[[2008]]
42 |month=May
43 |title=Dr Leonard Colebrook, FRS (1883-1967) and the chemotherapeutic
44 conquest of puerperal infection
45 |journal=Arch. Dis. Child. Fetal Neonatal Ed.
46 |volume=93
47 |issue=3
48 |pages=F246-8
49 | publisher = | location = | issn =
50 | pmid = 18426926
51 |doi = 10.1136/adc.2006.104448
52 | bibcode = | oclc =| id = | url = | language = | format = | accessdate =
53 | laysummary = | laysource = | laydate = | quote =
54 }}
(...)
191 ==See also==
192 * [[Prontosil]]
193
194 {{Persondata
195 |NAME =Colebrook, Leonard
196 |ALTERNATIVE NAMES =
197 |SHORT DESCRIPTION = English physician who introduced the use of
198 [[Prontosil]] in [[1935]] as a cure for [[puerperal fever]]
199 |DATE OF BIRTH = [[23 December]] [[1961]]
200 |PLACE OF BIRTH = [[Guildford, Surrey]]
201 |DATE OF DEATH = {{Death date|1967|9|27}}
202 |PLACE OF DEATH = [[Farnham Common]], [[Buckinghamshire]]
203 }}
204
205
206 {{physician-stub}}
207
208 {{DEFAULTSORT:Colebrook, Leonard}}
209 [[Category:1883 births]]
210 [[Category:1967 deaths]]
211 [[Category:British physisicans]]




  • 1-29 (Infobox Scientist)and 194-203 (Persondata) are respectively an infobox and a source of metadata about individuals. I guess those structures can be parsed and interpreted by some other tools such as Freebase or DBpedia
  • .
  • 4: I could not find any picture of Colebrook on http://commons.wikimedia.org. I put this link to a SVG figure as I don't have any image. It is then possible to answer the question: "who is missing a portrait ?"

  • 7 and 9: these are templates (~macro) which format the date. The later template calculates and prints the age of the individual at his death

  • 36-190: I searched the references about Colebrook on pubmed using the query "Colebrook L[PS]" ([PS] stands for Personal Name as Subject). The nine articles found were saved as XML and transformed to wiki-code using xsltproc and my xsltstylesheet pubmed2wiki

  • 206:just a signal to say "this article needs to be improved". It is then possible to answer the question: "what are the medical biographies which need to be improved ?"

  • 208: this template is used by wikipedia to sort the results of a query

  • 209-211: "Categories provide automatic indexes that are useful as tables of contents. Together with links and templates they structure a project."



That's it
Pierre

01 November 2007

Pubmed2Wikipedia

I've created a java tool called pubmed2wikipedia: I wrote it to quickly create a new entry for wikipedia.
First, the user select a set of articles about a given subject from pubmed, the software then download, prepare and format the data for a new wikipedia page. For example it creates the 'references' part and suggest the Categories: from the Mesh terms. I've also included a dictionary which recognize some regex patterns to help create a wikipedia internal link.
I first tried to use my own tool to create an entry about NSP3, a viral protein I studied during my PhD but with hundred of articles I felt I was not any more an expert about this protein :-) so I created a small article about another protein: RoXaN.

I hosted this tool on http://code.google.com. It is available at: http://lindenb.googlecode.com/files/pubmed2wikipedia.jar

Pierre

22 April 2007

Freebase !

I finaly received my invitation to join freebase. Freebase ,which was previously introduced by Tim O'Reilly, is a structured semantic wiki. I tested it today: this is a great product. I consider the whole site as a RDF/RDFS editor where you can define classes , properties and create some instances. In consequence, Freebase is far more structured than wikipedia.

The site comes with a complete API (MQL Metaweb Query Language(looks like SPARQL)) which can be used to query freebase and to create your own application (e.g. see CineSpin).

Example: searching for physicists born between 1800 and 1900:


{
"query":[{
"/people/person/date_of_birth":null,
"/people/person/date_of_birth<":"1900",
"/people/person/date_of_birth>=":"1800",
"limit":35,
"name":null,
"type":"/science/physicist"
}]
}


Result:


{
"result":[{
"/people/person/date_of_birth":"1879-03-14",
"name":"Albert Einstein",
"type":"/science/physicist"
},{
"/people/person/date_of_birth":"1878",
"name":"Lise Meitner",
"type":"/science/physicist"
},{
"/people/person/date_of_birth":"1867",
"name":"Marie Curie",
"type":"/science/physicist"
},{
"/people/person/date_of_birth":"1844",
"name":"Ludwig Boltzmann",
"type":"/science/physicist"
}],
"status":"/mql/status/ok"
}


Considering bioinformatics many types could be created (I've no time to play with this at this time ! ): defining Molecular Interactions, Biologists, etc...

Pierre

12 April 2007

Web2.0 and Science: A Presentation using SLIDY

(via Sun)Slidy is a purely web based presentation tool that can be displayed in a modern browser. No need to mail slides around the world and clutter email boxes, no need for the recipient to download a huge binary: just send someone a URL to your slide..

I've tested Slidy tonight by writing a short presentation about my thoughts on the web2.0 and science. You can read this presentation at:



Pierre

11 April 2007

How blast works ?

A few years ago, I wondered how blast was implemented: was there a way to play the binary file where the sequences were indexed ? I had a glance at the NCBI C toolkit but I was a little bit lost with all that source code. I asked the question via usenet and I recieved a mail from M. Dumontier who suggested me to have a look at the SLRI toolkit:

The Samuel Lunenfeld Research Institute (SLRI) Toolkit is a cross-platform toolkit for manipulating biological information. The SLRI toolkit is based mainly in C and derives many functions from the NCBI toolkit. The SLRI toolkit was developed mainly for data pertaining to protein structure and function but can be used to manipulate other data such as gene sequences.

Last sunday, I added a new short entry into wikipedia about formatdb and I wondered again how the software was implemented: what is the format of those files ? how are packaged the protein , the degenerate nucleotides ? could I implement a reader/writer with another language (java ?) ? Just for my own curiosity I would be interested to have some more information about how blast was implemented. Feel free to add some more information about this subject in wikipedia.

Pierre


PS: The problem with wikipedia via http://xkcd.com/ :-)

22 February 2007

A Brief History Of Sciences

(Introduction copied from DBPedia): Wikipedia is the by far largest available encyclopedia on the Web. Wikipedia has the problem that its search capabilities are limited to full-text search, which only allows very limited access to this valuable knowledge-base.Semantic Web technologies enable expressive queries against structured information on the Web. The Semantic Web has the problem that there is not much RDF data online yet and that up-to-date terms and ontologies are missing for many application domains. The DBPedia project approaches both problems by extracting structured information from Wikipedia and by making this information available on the Semantic Web. A major feature of DBPedia is to enable sophisticated queries against Wikipedia using SPARQL. Extracting structured information from Wikipedia leads to quite astonishing query answering possibilities.

In February, the dbpedia dataset was available for download.

For example, here are the N3 statements about Francis Crick that were extracted from wikipeda with dbpedia.

<http://en.wikipedia.org/wiki/Francis_Crick>      <http://3ba.se/wikipedia/attributes/name> "Francis Crick" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/image> "FrancisHarryComptonCrick.jpg" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/caption> "Francis Harry Compton Crick" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/birth_date> "1916068"^^<http://www.w3.org/2001/XMLSchema#Date> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/birth_place> <http://en.wikipedia.org/wiki/Weston_Favell> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/birth_place> <http://en.wikipedia.org/wiki/Northamptonshire> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/birth_place> <http://en.wikipedia.org/wiki/UK> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/residence> <http://upload.wikimedia.org/wikipedia/commons/a/ae/Flag_of_the_United_Kingdom.svg> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/residence> <http://en.wikipedia.org/wiki/UK> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/residence> <http://upload.wikimedia.org/wikipedia/commons/a/a4/Flag_of_the_United_States.svg> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/residence> <http://en.wikipedia.org/wiki/USA> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/nationality> "[[Image:Flag_of_the_United_Kingdom.svg|20px|]] [[England|English]]" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/death_date> "20040728"^^<http://www.w3.org/2001/XMLSchema#Date> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/death_place> <http://en.wikipedia.org/wiki/San_Diego> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/death_place> <http://en.wikipedia.org/wiki/California> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/death_place> <http://en.wikipedia.org/wiki/USA> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/field> <http://en.wikipedia.org/wiki/Biophysics> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/work_institution> <http://en.wikipedia.org/wiki/Salk_Institute> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/alma_mater> "[[University College London]][[University of Cambridge]]" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/doctoral_advisor> <http://en.wikipedia.org/wiki/Max_Perutz> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/doctoral_students> "None" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/known_for> "[[DNA|DNA structure]], [[consciousness]]" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://3ba.se/wikipedia/attributes/prizes> "[[Image:Nobel.png|20px]] [[Nobel Prize for Physiology or Medicine|Nobel Prize]] (1962)" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://en.wikipedia.org/wiki/Template:infobox_scientist> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> "Francis Crick" .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://en.wikipedia.org/wiki/Category:Nobel_laureates_in_Physiology_or_Medicine> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://en.wikipedia.org/wiki/Category:English_neuroscientists> .
<http://en.wikipedia.org/wiki/Francis_Crick> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://en.wikipedia.org/wiki/Category:English_humanists> .


Iv' been playing with this source of data to build a dynamic timeline of scientists. The person choosen was the one where I could find a date of birth/death (some people didn't have this information or the date was not parseable (e.g. "[[March 11]], [[1922]] ([[Istanbul]], [[Turkey]])") ord I only used the predicates that could be used to find the date. For example here are a few predicates that could be used to find the date of birth:
http://3ba.se/wikipedia/attributes/birth_date
http://3ba.se/wikipedia/attributes/birthdate
http://3ba.se/wikipedia/attributes/Birthdate
http://3ba.se/wikipedia/attributes/date_birth
http://3ba.se/wikipedia/attributes/datebirth
http://3ba.se/wikipedia/attributes/date_of_birth
http://3ba.se/wikipedia/attributes/dateofbirth
http://3ba.se/wikipedia/attributes/DateOfBirth
http://3ba.se/wikipedia/attributes/DATE_OF_BIRTH
http://en.wikipedia.org/wiki/Category:1945_births>
(...)


I also used relationships between persons that could be found in the database. e.g.:


<http://en.wikipedia.org/wiki/Hermann_Joseph_Muller> <http://3ba.se/wikipedia/attributes/teachers> <http://en.wikipedia.org/wiki/Thomas_Hunt_Morgan> .


The result is a java application called WikiStory.
It requires JAVA Webstart 1.6. You can use the following command line:

javaws http://www.urbigene.com/wikistory/wikistory.jnlp



WikiStory



Selecting one or more category will select all the people that belong to it.
Clicking on a person in the timeline will load the page from wikipedia.


The timeline can also be saved as a SVG picture. If you're using Firefox, you can see an example Here
geneticists


See also:

http://www.wikitimescale.org/index.php
http://meta.wikimedia.org/wiki/EasyTimeline
http://www.futureswatch.org/Timeline.htm
http://www.todayinsci.com/
http://www.perseus.tufts.edu/
http://dandelife.com/
http://www.timelineindex.com/content/home.php








Pierre