10 June 2007

Mapping NCBI/PUBMED

In my previous post I showed how I used the tag <Affiliation> from the XML/pubmed records to extract the mails and the names from the authors of a paper. I've slightly changed the source code of this program to find the country of origin of each paper. To retrieve the country I used:
1) the suffix of the mail (if any)
2) the name of the country (if any)
3) the name of the city (a few famous one such as Standord, for the US or UK)

My program takes as input a pubmed query and the ouput is the number of papers per year and per country. I put a few results on ManyEyes. As an example with the query "Rotavirus" with 1000 records, I was able to retrieve 887 countries.






Publications in "Bioinformatics", "BMC Bioinformatics", "Plos Comp. Biol."







Publications about "Rotavirus"







publications about malaria, anopheles, plasmodium etc...

4 comments:

Egon Willighagen said...

Pierre, where can I give it a try? I want to see how the use of 'cheminformatics' and 'chemoinformatics' is distributed; I expect a US/EU split up, but your tool allows me to visualize the query and see if it is true.

Pierre Lindenbaum said...

Egon, I put the source in http://www.urbigene.com/sandbox/NCBIMap.java
Enjoy

Pierre

Anonymous said...

I have been searching for a way to do exactly what you describe but the link on your post is no longer working (not shocking since it is several years old). Is the program still available?

Pierre Lindenbaum said...

http://code.google.com/p/lindenb/source/browse/trunk/src/java/org/lindenb/tool/oneshot/NCBIMap.java