10 June 2008

Pubmed, impact factors, sorting and FriendFeed

I recently said on twitter that I wished I could sort the articles on pubmed using the impact factors of the journals. What followed was a demonstration of the power of friendfeed and was also observed under some other circumstances by Deepak Singh, Pedro Beltrao and some others... Within a day several persons joined the conversation on friendfeed and among them, Lars Juhl Jensen and Deepak suggested me to have a look at http://www.eigenfactor.org where the Eigenfactor is a measure of the journal's total importance to the scientific community. I must also cite Euan who was inspired by this discussion and created PubmedFaceoff, a photorealistic variant of the Chernoff Faces visualization technique based on pubmed.

Now let's go back to my sorting problem: I've joined the data from www.eigenfactor.org (with the kind permission of Carl Bergstrom) and from http://www.ncbi.nlm.nih.gov/entrez/citmatch_help.html#JournalLists and I've uploaded this new dataset on IBM-ManyEyes:

I wrote a java program reading a set of pubmed articles formatted in XML and using the scoring dataset. The algorithm is trivial: the XML element of the articles are removed from their parent node, sorted on their 'eigenfactors' retrieved from the journal <NlmId>, and then inserted back.

The source is available here

The executable jar (containing the scoring dataset) is available here:

Here is an example: I want to sort the articles about Charles Darwin. I've feched all the 372 articles in XML from this query
java -jar lindenb/build/sortpubmed.jar ~/pubmed_result.txt > result.xml

Here are the first articles:

    * Schmidhuber, Jürgen (Apr. 2008). "Comparing the legacies of Gauss, Pasteur and Darwin". Nature 452 (7187): 530. doi:10.1038/452530b. PMID 18256649. 
* Padian, Kevin (Feb. 2008). "Darwin's enduring legacy". Nature 451 (7179): 632-4. doi:10.1038/451632a. PMID 18305520.
* Odling-Smee, Lucy (Mar. 2007). "Darwin and the 20-year publication gap". Nature 446 (7135): 478-9. doi:10.1038/446478a. PMID 17392756.
* Oliveira, João Gama; Barabási Albert-László (Oct. 2005). "Human dynamics: Darwin and Einstein correspondence patterns". Nature 437 (7063): 1251. doi:10.1038/4371251a. PMID 16724015.
* Kohn, David; Murrell Gina, Parker John, Whitehorn Mark (Aug. 2005). "What Henslow taught Darwin". Nature 436 (7051): 643-5. doi:10.1038/436643a. PMID 16079834.
* Ridley, Matt (Sep. 2004). "Crick and Darwin's shared publication in Nature". Nature 431 (7006): 244. doi:10.1038/431244a. PMID 15372004.
* Gruber, J W (Oct. 2001). "Owen was right, as Darwin's work continues". Nature 413 (6857): 669. doi:10.1038/35099725. PMID 11449244.
* Padian, K (Jul. 2001). "Owen's Parthian shot". Nature 412 (6843): 123-4. doi:10.1038/35084289. PMID 11606991.
* Rhodes, F H (. 1983). "Gradualism, punctuated equilibrium and the Origin of Species". Nature 305 (5932): 269-72. PMID 6353241.
* Maynard-Smith, J (Apr. 1982). "The century since Darwin". Nature 296 (5858): 599-601. PMID 7040979.
* "Darwin's questions" (Jan. 1969). Nature 221 (5178): 313. PMID 4884839.
* Hector, Andy; Hooper Rowan (Jan. 2002). "Ecology. Darwin and the first ecological experiment". Science 295 (5555): 639-40. doi:10.1126/science.1064815. PMID 11809960.
* Corsi (May. 1987). "Further Letters of Darwin: The Correspondence of Charles Darwin". Science 236 (4804): 988-989. doi:10.1126/science.236.4804.988. PMID 17812771.
* Schweber (May. 1985). "Darwin's Earliest Letters: The Correspondence of Charles Darwin". Science 228 (4701): 838-841. doi:10.1126/science.228.4701.838. PMID 17815024.
* Lewin, R (Aug. 1982). "Darwin died at a most propitious time". Science 217 (4561): 717-8. PMID 7048528.
* Gould, S J (Apr. 1982). "Darwinism and the expansion of evolutionary theory". Science 216 (4544): 380-7. PMID 7041256.
* Zirkle (May. 1964). "Charles Darwin". Science 144 (3619): 724-725. doi:10.1126/science.144.3619.724-a. PMID 17807061.
* Cholodny (Nov. 1937). "CHARLES DARWIN AND THE MODERN THEORY OF TROPISMS". Science 86 (2238): 468. doi:10.1126/science.86.2238.468. PMID 17815459.
* Leidy (Sep. 1929). "CEREMONY ATTENDING THE OPENING OF DOWN HOUSE, THE HOME OF CHARLES DARWIN". Science 70 (1810): 228-231. doi:10.1126/science.70.1810.228. PMID 17775389.
* Osborn (Jun. 1929). "GIFT TO DOWN HOUSE OF THE ORIGINAL LETTERS OF CHARLES DARWIN TO FRITZ MULLER". Science 69 (1799): 645. doi:10.1126/science.69.1799.645. PMID 17791947.
* Osborn (Dec. 1926). "A CONTEMPORARY OF CHARLES DARWIN". Science 64 (1669): 623-624. doi:10.1126/science.64.1669.623-a. PMID 17834475.
* Sampson (Sep. 1909). "LETTERS FROM CHARLES DARWIN". Science 30 (766): 303-304. doi:10.1126/science.30.766.303. PMID 17837456.
* Ayala, Francisco J (May. 2007). "Darwin's greatest discovery: design without designer". Proc. Natl. Acad. Sci. U.S.A. 104 Suppl 1: 8567-73. doi:10.1073/pnas.0701072104. PMID 17494753.

I'm looking for a job !

Hi all,
I've just given my letter of resignation to my current employer , thus I'll be free from my professional obligations on September 1st and from now on I'm looking for a new fascinating job combining biology and informatics near Paris, France. This blog is a proof of my passion and my motivation for this area of interest. Recruiters can contact me via e-mail ( plindenbaum at yahoo fr ) and consult my resume on linkedin at http://www.linkedin.com/in/lindenbaum.

Pierre Lindenbaum PhD

Après avoir remis ma lettre de démission à mon employeur actuel, je serai libre de mes obligations professionnelles à compter du 1er septembre 2008. Je suis donc à la recherche de toute opportunité d'emploi combinant la biologie et l'informatique dans la région de Paris. Ce blog étant une preuve de ma passion et de ma motivation pour ce sujet. Les recruteurs intéressés par mon profil peuvent consulter mon CV sur linkedin http://www.linkedin.com/in/lindenbaum et me joindre par mail à "plindenbaum at yahoo fr"

Pierre Lindenbaum PhD