Showing posts with label citation. Show all posts
Showing posts with label citation. Show all posts

04 June 2011

Pubmed: sorting the articles on the number of times they've been cited

In 2008 I used www.eigenfactor.org/ to sort a set of Pubmed articles on the impact factor of the journal. In the current post I will show I've used NCBI ELink to sort the articles on the number of times they've have been cited in some other articles in pubmed-central.

The NCBI ELink API checks for the existence of an external or Related Articles link from a list of one or more primary IDs. It can be used to retrieve the article in pubmed central citing a given PMID.
For example, the the following uri: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmode=xml&dbfrom=pubmed&id=19755503&cmd=neighbor returns the 3 articles that cited the Gene Wiki paper:

(...) <LinkSetDb>
<DbTo>pubmed</DbTo>
<LinkName>pubmed_pubmed_citedin</LinkName>
<Link>
<Id>21516242</Id>
</Link>
<Link>
<Id>21062808</Id>
</Link>
<Link>
<Id>20334642</Id>
</Link>
</LinkSetDb>
(...)
.

I wrote a java program using this resource to sort the articles on the number of time they have been cited. The program is available on github at: .

Example

Let's sort the articles published in the 2005 NAR-Database Issue:
java -jar dist/pubmedsortbycitations.jar -c -L ALL -e '"Nucleic Acids Res"[JOUR] "Database issue"[ISS] 2005[PDAT]' > sorted.xml
OR
java -jar dist/pubmedsortbycitations.jar -c -L ALL pubmed_result_saved_as.xml > sorted.xml

The output is a sorted set of XML pubmed records.
The most cited article (290 references) is The Universal Protein Resource (UniProt)..
Some articles have never been cited: e.g.: Metagrowth: a new resource for the building of metabolic hypotheses in microbiology.

The '-c' option in the command line enables the program to insert a new XML node containing the PMID of the articles citing one article:
(...)
<ArticleId IdType="pubmed">15608167</ArticleId>
<ArticleId IdType="pmc">PMC540024</ArticleId>
</ArticleIdList>
</PubmedData>
<CitedBy count="290">
<PMID>15608199</PMID>
<PMID>15608238</PMID>
<PMID>15608243</PMID>
<PMID>15769290</PMID>
<PMID>15888679</PMID>
<PMID>15980452</PMID>
(...)
<PMID>21450054</PMID>
<PMID>21453542</PMID>
<PMID>21544166</PMID>
</CitedBy>
</PubmedArticle>



That's it,

Pierre

17 January 2008

Thomson scientific launches www.researcherid.com

http://www.researcherid.com

Thomson scientific launches researcher id.com to associate a researcher with their published works:

Unique Identifier Ensures An Accurate Record Of A Researcher’s Output And Attribution and Builds a World-class Author Community

Researcher ID is a global, multi-disciplinary scholarly research community. Each researcher listed is assigned a unique identifier, to aid in solving the common problem of author misidentification. Search the registry to find citations, collaborators, and more.

see also: http://scientific.thomson.com/press/2008/8429910/

Pierre

03 October 2007

Publish or Perish

FYI: Found today but not tested:


Publish or Perish - A citation analysis software program, designed to help individual academics to present their case for research impact to its best advantage.



Publish or Perish is a software program that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations, then analyzes these and presents the following statistics:




  • Total number of papers

  • Total number of citations

  • Average number of citations per paper

  • Average number of citations per author

  • Average number of papers per author

  • Average number of citations per year

  • Hirsch's h-index and related parameters

  • Egghe's g-index

  • The contemporary h-index

  • The age-weighted citation rate

  • Two variations of individual h-indices

  • An analysis of the number of authors per paper.

10 June 2007

Mapping NCBI/PUBMED

In my previous post I showed how I used the tag <Affiliation> from the XML/pubmed records to extract the mails and the names from the authors of a paper. I've slightly changed the source code of this program to find the country of origin of each paper. To retrieve the country I used:
1) the suffix of the mail (if any)
2) the name of the country (if any)
3) the name of the city (a few famous one such as Standord, for the US or UK)

My program takes as input a pubmed query and the ouput is the number of papers per year and per country. I put a few results on ManyEyes. As an example with the query "Rotavirus" with 1000 records, I was able to retrieve 887 countries.






Publications in "Bioinformatics", "BMC Bioinformatics", "Plos Comp. Biol."







Publications about "Rotavirus"







publications about malaria, anopheles, plasmodium etc...

10 May 2007

Structured Abstracts

There was a discussion about creating semantic/structured abstracts of papers in Nautilus , in Neil's blog and I also suggested it before. We can fantasize about this but for this and other things (tagging, unique id for authors, ...) we are still dependent of the NCBI. A central repository of those structured abstract could be created but I fear it would be only filled by a only few people. Moreover using text mining on pubmed can give some good results: see http://www.ihop-net.org/UniPub/iHOP/ (There may be other refs but I like this one).

Pierre