Calculating time from submission to publication / Degree of burden in submitting a paper.
Calculating time from submission to publication / Degree of burden in submitting a paper. Pierre Lindenbaum, Ryan Delahanty.
figshare.
Retrieved 10:13, Oct 14, 2012 (GMT)
http://dx.doi.org/10.6084/m9.figshare.96403
This dataset was inspired by this post on biostar, initialy asked by Ryan Delahanty: I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:
<PubmedData> <History> <PubMedPubDate PubStatus="received"> <Year>2011</Year> <Month>11</Month> <Day>29</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="accepted"> <Year>2011</Year> <Month>12</Month> <Day>20</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> (...)
In this dataset, the script 'pubmed.sh" downloads the the journals from http://www.ncbi.nlm.nih.gov/books/NBK3827/table/pubmedhelp.pubmedhelptable45/ , the 'eigenfactors' from http://www.eigenfactor.org.
For each journal , It scans pubmed (starting from year=2000) and get the difference between the date[@PubStatus='received'] and the date[@PubStatus='accepted'].
title | issn | eigenfactor | days |
---|---|---|---|
"Acta biochimica Polonica" | 0001-527X | 0.003996 | 119.770935960591 |
"Acta biomaterialia" | 1742-7061 | 0.02152 | 129.682692307692 |
"Acta biotheoretica" | 0001-5342 | 0.000844 | 161.897058823529 |
"Acta cirurgica brasileira / Sociedade Brasileira para Desenvolvimento Pesquisa em Cirurgia" | 0102-8650 | 0.00128 | 122.038461538462 |
"Acta cytologica" | 0001-5547 | 0.002305 | 65.3006134969325 |
"Acta diabetologica" | 0940-5429 | 0.001851 | 299.6 |
"Acta haematologica" | 0001-5792 | 0.002825 | 118.654676258993 |
"Acta histochemica" | 0065-1281 | 0.002162 | 110.471204188482 |
"Acta histochemica et cytochemica" | 0044-5991 | 0.000677 | 81.6455696202532 |
"Acta neurochirurgica" | 0001-6268 | 0.009685 | 204.371830985916 |
"Acta neuropathologica" | 0001-6322 | 0.023471 | 69.7277882797732 |
"Acta theriologica" | 0001-7051 | 0.000901 | 147.0 |
"Acta tropica" | 0001-706X | 0.01011 | 196.577777777778 |
"Acta veterinaria Scandinavica" | 0044-605X | 0.001612 | 82.0 |
"Addictive behaviors" | 0306-4603 | 0.017915 | 163.049731182796 |
"Advances in space research " | 0273-1177 | 0.021217 | 205.0 |
Ambio | 0044-7447 | 0.007463 | 181.878048780488 |
"American journal of human genetics" | 0002-9297 | 0.120156 | 67.1898928024502 |
"American journal of hypertension" | 0895-7061 | 0.017359 | 104.074576271186 |
(....) |
Here is the kind of figure I got:
As far as I remember, "Cell" is the point having the highest eigenfactor.
Note: pubmed contains some errors: e.g. received > accepted (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20591334&retmode=xml) or some dates in the future: ( http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12921703&retmode=xml )
That's it,
Pierre
But note that some journals give what can most charitably be described as very misleading information on submission-to-acceptance times. See http://svpow.com/2012/10/03/dear-royal-society-please-stop-lying-to-us-about-publication-times/
ReplyDeleteAwesome. Any reason I don't find bioinformatics papers in the CSV file?
ReplyDelete@Christian, many journals do not contain any information about the dates.
ReplyDeleteThis is really interesting, Pierre, and I commend you for putting it on figshare. It is useful to know how to extract these dates automatically and manipulate them.
ReplyDeleteHowever, the submitted to accepted time is not a particularly useful statistic, as it includes the time the authors take to revise (if revision is invited). More useful would be submission to first decision time (which I think would not be possible to obtain from PubMed) and acceptance to publication (which presumably would). I've written more about acceptance to publication times here: http://sharmanedit.wordpress.com/2012/06/13/acceptance-to-publication-time/
Have you thought about doing a similar analysis for acceptance to publication time? If you do, I would predict a bimodal distribution, with one peak for journals that publish very soon after acceptance before or without copyediting or typesetting, and another peak for those that edit and typeset before publication.
(@neilfws posted a comment but I deleted it by mistake )
ReplyDelete@sharmanedit I don't think that acceptance to publication is much better as a metric, at least as defined by PubMed. Where publication dates are available, they are given as PubStatus="aheadofprint" or PubStatus="entrez/pubmed/medline". None of these provide much information about editing or typesetting.