10 June 2008

Pubmed, impact factors, sorting and FriendFeed

I recently said on twitter that I wished I could sort the articles on pubmed using the impact factors of the journals. What followed was a demonstration of the power of friendfeed and was also observed under some other circumstances by Deepak Singh, Pedro Beltrao and some others... Within a day several persons joined the conversation on friendfeed and among them, Lars Juhl Jensen and Deepak suggested me to have a look at http://www.eigenfactor.org where the Eigenfactor is a measure of the journal's total importance to the scientific community. I must also cite Euan who was inspired by this discussion and created PubmedFaceoff, a photorealistic variant of the Chernoff Faces visualization technique based on pubmed.

Now let's go back to my sorting problem: I've joined the data from www.eigenfactor.org (with the kind permission of Carl Bergstrom) and from http://www.ncbi.nlm.nih.gov/entrez/citmatch_help.html#JournalLists and I've uploaded this new dataset on IBM-ManyEyes:

I wrote a java program reading a set of pubmed articles formatted in XML and using the scoring dataset. The algorithm is trivial: the XML element of the articles are removed from their parent node, sorted on their 'eigenfactors' retrieved from the journal <NlmId>, and then inserted back.

The source is available here

The executable jar (containing the scoring dataset) is available here:

Here is an example: I want to sort the articles about Charles Darwin. I've feched all the 372 articles in XML from this query
java -jar lindenb/build/sortpubmed.jar ~/pubmed_result.txt > result.xml

Here are the first articles:

    * Schmidhuber, Jürgen (Apr. 2008). "Comparing the legacies of Gauss, Pasteur and Darwin". Nature 452 (7187): 530. doi:10.1038/452530b. PMID 18256649. 
* Padian, Kevin (Feb. 2008). "Darwin's enduring legacy". Nature 451 (7179): 632-4. doi:10.1038/451632a. PMID 18305520.
* Odling-Smee, Lucy (Mar. 2007). "Darwin and the 20-year publication gap". Nature 446 (7135): 478-9. doi:10.1038/446478a. PMID 17392756.
* Oliveira, João Gama; Barabási Albert-László (Oct. 2005). "Human dynamics: Darwin and Einstein correspondence patterns". Nature 437 (7063): 1251. doi:10.1038/4371251a. PMID 16724015.
* Kohn, David; Murrell Gina, Parker John, Whitehorn Mark (Aug. 2005). "What Henslow taught Darwin". Nature 436 (7051): 643-5. doi:10.1038/436643a. PMID 16079834.
* Ridley, Matt (Sep. 2004). "Crick and Darwin's shared publication in Nature". Nature 431 (7006): 244. doi:10.1038/431244a. PMID 15372004.
* Gruber, J W (Oct. 2001). "Owen was right, as Darwin's work continues". Nature 413 (6857): 669. doi:10.1038/35099725. PMID 11449244.
* Padian, K (Jul. 2001). "Owen's Parthian shot". Nature 412 (6843): 123-4. doi:10.1038/35084289. PMID 11606991.
* Rhodes, F H (. 1983). "Gradualism, punctuated equilibrium and the Origin of Species". Nature 305 (5932): 269-72. PMID 6353241.
* Maynard-Smith, J (Apr. 1982). "The century since Darwin". Nature 296 (5858): 599-601. PMID 7040979.
* "Darwin's questions" (Jan. 1969). Nature 221 (5178): 313. PMID 4884839.
* Hector, Andy; Hooper Rowan (Jan. 2002). "Ecology. Darwin and the first ecological experiment". Science 295 (5555): 639-40. doi:10.1126/science.1064815. PMID 11809960.
* Corsi (May. 1987). "Further Letters of Darwin: The Correspondence of Charles Darwin". Science 236 (4804): 988-989. doi:10.1126/science.236.4804.988. PMID 17812771.
* Schweber (May. 1985). "Darwin's Earliest Letters: The Correspondence of Charles Darwin". Science 228 (4701): 838-841. doi:10.1126/science.228.4701.838. PMID 17815024.
* Lewin, R (Aug. 1982). "Darwin died at a most propitious time". Science 217 (4561): 717-8. PMID 7048528.
* Gould, S J (Apr. 1982). "Darwinism and the expansion of evolutionary theory". Science 216 (4544): 380-7. PMID 7041256.
* Zirkle (May. 1964). "Charles Darwin". Science 144 (3619): 724-725. doi:10.1126/science.144.3619.724-a. PMID 17807061.
* Cholodny (Nov. 1937). "CHARLES DARWIN AND THE MODERN THEORY OF TROPISMS". Science 86 (2238): 468. doi:10.1126/science.86.2238.468. PMID 17815459.
* Leidy (Sep. 1929). "CEREMONY ATTENDING THE OPENING OF DOWN HOUSE, THE HOME OF CHARLES DARWIN". Science 70 (1810): 228-231. doi:10.1126/science.70.1810.228. PMID 17775389.
* Osborn (Jun. 1929). "GIFT TO DOWN HOUSE OF THE ORIGINAL LETTERS OF CHARLES DARWIN TO FRITZ MULLER". Science 69 (1799): 645. doi:10.1126/science.69.1799.645. PMID 17791947.
* Osborn (Dec. 1926). "A CONTEMPORARY OF CHARLES DARWIN". Science 64 (1669): 623-624. doi:10.1126/science.64.1669.623-a. PMID 17834475.
* Sampson (Sep. 1909). "LETTERS FROM CHARLES DARWIN". Science 30 (766): 303-304. doi:10.1126/science.30.766.303. PMID 17837456.
* Ayala, Francisco J (May. 2007). "Darwin's greatest discovery: design without designer". Proc. Natl. Acad. Sci. U.S.A. 104 Suppl 1: 8567-73. doi:10.1073/pnas.0701072104. PMID 17494753.


Anonymous said...

Wonderful stuff. Can't wait to play around once I am on a faster wifi connection. Especially with trying to visualize the results

Pierre Lindenbaum said...

This is just a trivial java program Deepak !
Not a greasmonkey script working with NCBI :-)

Anonymous said...

Wow, I would never have guessed that "Acta biochimica et biophysica Academiae Scientiarum Hungaricae" is that influential!
This journal leaves Nature, Science and Cell in the dust.

Pierre Lindenbaum said...

Thanks, this is a bug, articles without ISSN have a wrong impact factor

Pierre Lindenbaum said...

Kay, this is fixed now.

Linda said...

This is just what we need at our library, but unfortunately I'm getting a Java exception when I try to run this jar on a Mac OS 10.4.11. The error is:

Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

Thank you.

Stephan Spitzer

Pierre Lindenbaum said...

Hi ,
I compiled this jar under java 6.0.
You need Java 6.0 to run this application

Tom said...

Hi, I only recently ran across this and have Java 6.0 and run this under Windows 7 at a MSDOS prompt. When the results go to the result.xml file any viewer, such as Firefox, for this file displays mountains of lines of information. Nothing seems parsed down to just the title, abstracts, PMID's, etc...

Please help.


Pierre Lindenbaum said...

Tom, this is an old code. Please show me a sample of your output.

Anonymous said...

I tried the .jar today with the .xml output of pubmed... But it semes that there is some change in the xml structure. I always get the following error message.. :-(

java -jar Downloads/sortpubmed.jar ~/pubmed_result.xml > result.xml
[Fatal Error] pubmed_result.xml:302:2: The markup in the document following the root element must be well-formed.
org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
at org.lindenb.tinytools.ImpactFactorSorter.main(ImpactFactorSorter.java:272)

Pierre Lindenbaum said...

Your XML is not well-formed.

Alice said...

Hi Pierre, thank you for the program. I am asking a very naive question: how do I run the jar program in order to sort the xml file from the pubmed results? If I double click on it, I see window popping up and then disappearing. Thanks to you and to anybody would like to answer. Alice

Pierre Lindenbaum said...

Hi Alice,
that was an old post (2008)! To run a jar (under windows?) you need to first, open a window MS-DOS console https://en.wikipedia.org/wiki/MS-DOS and to execute a command line like 'java -jar nameoftheprogram.jar pubmed.xml' . I hope that helps.

Alice said...

Hi Pierre, thanks for answering so fast. It looks like I need to learn more on starting java programs in MS-DOS!