04 June 2011

Pubmed: sorting the articles on the number of times they've been cited

In 2008 I used www.eigenfactor.org/ to sort a set of Pubmed articles on the impact factor of the journal. In the current post I will show I've used NCBI ELink to sort the articles on the number of times they've have been cited in some other articles in pubmed-central.

The NCBI ELink API checks for the existence of an external or Related Articles link from a list of one or more primary IDs. It can be used to retrieve the article in pubmed central citing a given PMID.
For example, the the following uri: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmode=xml&dbfrom=pubmed&id=19755503&cmd=neighbor returns the 3 articles that cited the Gene Wiki paper:

(...) <LinkSetDb>
<DbTo>pubmed</DbTo>
<LinkName>pubmed_pubmed_citedin</LinkName>
<Link>
<Id>21516242</Id>
</Link>
<Link>
<Id>21062808</Id>
</Link>
<Link>
<Id>20334642</Id>
</Link>
</LinkSetDb>
(...)
.

I wrote a java program using this resource to sort the articles on the number of time they have been cited. The program is available on github at: .

Example

Let's sort the articles published in the 2005 NAR-Database Issue:
java -jar dist/pubmedsortbycitations.jar -c -L ALL -e '"Nucleic Acids Res"[JOUR] "Database issue"[ISS] 2005[PDAT]' > sorted.xml
OR
java -jar dist/pubmedsortbycitations.jar -c -L ALL pubmed_result_saved_as.xml > sorted.xml

The output is a sorted set of XML pubmed records.
The most cited article (290 references) is The Universal Protein Resource (UniProt)..
Some articles have never been cited: e.g.: Metagrowth: a new resource for the building of metabolic hypotheses in microbiology.

The '-c' option in the command line enables the program to insert a new XML node containing the PMID of the articles citing one article:
(...)
<ArticleId IdType="pubmed">15608167</ArticleId>
<ArticleId IdType="pmc">PMC540024</ArticleId>
</ArticleIdList>
</PubmedData>
<CitedBy count="290">
<PMID>15608199</PMID>
<PMID>15608238</PMID>
<PMID>15608243</PMID>
<PMID>15769290</PMID>
<PMID>15888679</PMID>
<PMID>15980452</PMID>
(...)
<PMID>21450054</PMID>
<PMID>21453542</PMID>
<PMID>21544166</PMID>
</CitedBy>
</PubmedArticle>



That's it,

Pierre

10 comments:

Egon Willighagen said...

Nice! Do you know how extensive their citation database is? For example, does it accurately calculate your H index?

Pierre Lindenbaum said...

@Egon I think it is only restricted to the articles published in PMC.

I don't know if it is accurate enough to create a H-index (or any other index) but for sure, there are some fun things to do with ELink :-)

Yann said...

@Pierre it looks like you're using the pubmed_pubmed_citedin link, which unfortunately is not described in the link description page:

http://www.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html#pubmed

I could not find any documentation via google, or in the pubmed help. I'm not quite sure how it is calculated, but from the syntax of the link it sounds like it's actually using the full pubmed rather than only PMC.

By the way, you can specify the link directly in the request, so that you don't have to load all of them:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmode=xml&dbfrom=pubmed&id=19755503&cmd=neighbor&linkname=pubmed_pubmed_citedin

For sure, there are lots of fun things to do with eUtils!

Daniel Mietchen said...

Wouldn't this tool also allow to (re)test the Open Access citation advantage, by comparing articles that are in the OA subset to those that are not?

Pierre Lindenbaum said...

@Yann, many thanks !

klortho said...

pubmed_pubmed_citedin links are automatically generated as the reciprocal of pubmed_pubmed_refs links. On the link description page, pubmed_pubmed_refs is described as "Citation referenced in PubMed article. Only valid for PubMed citations that are also in PMC.".

So pubmed_pubmed_refs is a list of citations from article x (in pubmed) to articles in PMC. Then pubmed_pubmed_citedin is the list of citations from any PMC article to article x.

karthickdawk said...

Hi,
That looks neat.But, as I browse through Pubmed , the articles are categorized as "Related articles" and not articles that cited the article of interest.Am I missing something here?

Pierre Lindenbaum said...

Yes it seems that the API has changed.

Unknown said...

I'm barely starting to scratch the surface of this but am having difficulty finding any useful information on the following syntax:

pubmed_pubmed
pubmed_pubmed_citedin
pubmed_pubmed_combined
pubmed_pubmed_five
pubmed_pubmed_reviews
pubmed_pubmed_reviews_five
pubmed_pubmed_refs

Any personal knowledge on them that you could share would great, as would any useful links to further information. Google has not been much help nor has the NCBI help files.

Thanks.

Ahmed

Unknown said...

@ahmed
It's been a very long time since you asked your question, but here is your answer:
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html

Maybe this will help other people.