14 October 2012

Calculating time from submission to publication / Degree of burden in submitting a paper

After "404 not found": a database of non-functional resources in the NAR database collection, I've uploaded my second dataset on figshare:
Calculating time from submission to publication / Degree of burden in submitting a paper

Calculating time from submission to publication / Degree of burden in submitting a paper. Pierre Lindenbaum,  Ryan Delahanty.
Retrieved 10:13, Oct 14, 2012 (GMT)

This dataset was inspired by this post on biostar, initialy asked by Ryan Delahanty: I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:
            <PubMedPubDate PubStatus="received">
            <PubMedPubDate PubStatus="accepted">

In this dataset, the script 'pubmed.sh" downloads the the journals from http://www.ncbi.nlm.nih.gov/books/NBK3827/table/pubmedhelp.pubmedhelptable45/ , the 'eigenfactors' from http://www.eigenfactor.org.

For each journal , It scans pubmed (starting from year=2000) and get the difference between the date[@PubStatus='received'] and the date[@PubStatus='accepted'].

"Acta biochimica Polonica"0001-527X0.003996119.770935960591
"Acta biomaterialia"1742-70610.02152129.682692307692
"Acta biotheoretica"0001-53420.000844161.897058823529
"Acta cirurgica brasileira / Sociedade Brasileira para Desenvolvimento Pesquisa em Cirurgia"0102-86500.00128122.038461538462
"Acta cytologica"0001-55470.00230565.3006134969325
"Acta diabetologica"0940-54290.001851299.6
"Acta haematologica"0001-57920.002825118.654676258993
"Acta histochemica"0065-12810.002162110.471204188482
"Acta histochemica et cytochemica"0044-59910.00067781.6455696202532
"Acta neurochirurgica"0001-62680.009685204.371830985916
"Acta neuropathologica"0001-63220.02347169.7277882797732
"Acta theriologica"0001-70510.000901147.0
"Acta tropica"0001-706X0.01011196.577777777778
"Acta veterinaria Scandinavica"0044-605X0.00161282.0
"Addictive behaviors"0306-46030.017915163.049731182796
"Advances in space research "0273-11770.021217205.0
"American journal of human genetics"0002-92970.12015667.1898928024502
"American journal of hypertension"0895-70610.017359104.074576271186

Here is the kind of figure I got:

As far as I remember, "Cell" is the point having the highest eigenfactor.

Note: pubmed contains some errors: e.g. received > accepted (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20591334&retmode=xml) or some dates in the future: ( http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12921703&retmode=xml )

That's it,



Mike Taylor said...

But note that some journals give what can most charitably be described as very misleading information on submission-to-acceptance times. See http://svpow.com/2012/10/03/dear-royal-society-please-stop-lying-to-us-about-publication-times/

Christian said...

Awesome. Any reason I don't find bioinformatics papers in the CSV file?

Pierre Lindenbaum said...

@Christian, many journals do not contain any information about the dates.

sharmanedit said...

This is really interesting, Pierre, and I commend you for putting it on figshare. It is useful to know how to extract these dates automatically and manipulate them.
However, the submitted to accepted time is not a particularly useful statistic, as it includes the time the authors take to revise (if revision is invited). More useful would be submission to first decision time (which I think would not be possible to obtain from PubMed) and acceptance to publication (which presumably would). I've written more about acceptance to publication times here: http://sharmanedit.wordpress.com/2012/06/13/acceptance-to-publication-time/
Have you thought about doing a similar analysis for acceptance to publication time? If you do, I would predict a bimodal distribution, with one peak for journals that publish very soon after acceptance before or without copyediting or typesetting, and another peak for those that edit and typeset before publication.

Pierre Lindenbaum said...

(@neilfws posted a comment but I deleted it by mistake )

@sharmanedit I don't think that acceptance to publication is much better as a metric, at least as defined by PubMed. Where publication dates are available, they are given as PubStatus="aheadofprint" or PubStatus="entrez/pubmed/medline". None of these provide much information about editing or typesetting.