I have made my feelings about the impact factor well known but still, far too many cling to it as if it were some sort of holy grail. Now it seems I was both wrong and right: it appears to be both deeply flawed but better than any of the alternatives. Readers should weigh in if they disagree. The following is from a recent publication in PLOS via Retraction Watch. The paper examined IF alongside other methods of peer review:
Retraction Watch
We’ve sometimes said, paraphrasing Winston Churchill, that pre-publication peer review is the worst way to vet science, except for all the other ways that have been tried from time to time.
The authors of a new paper in PLOS Biology, Adam Eyre-Walker and Nina Stoletzki, compared three of those other ways to judge more than 6,500 papers published in 2005:
subjective post-publication peer review, the number of citations gained by a paper, and the impact factor of the journal in which the article was published
Their findings?
We conclude that the three measures of scientific merit considered here are poor; in particular subjective assessments are an error-prone, biased, and expensive method by which to assess merit. We argue that the impact factor may be the most satisfactory of the methods we have considered, since it is a form of pre-publication review. However, we emphasise that it is likely to be a very error-prone measure of merit that is qualitative, not quantitative.
(Disclosure: Ivan worked at Thomson Reuters, whose Thomson Scientific division owns the impact factor, from 2009 until the middle of this year, but was at Reuters Health, a completely separate unit of the company.)
Or, put another way, as Eyre-Walker told The Australian:
Scientists are probably the best judges of science, but they are pretty bad at it.
In an accompanying editorial, Jonathan Eisen, Catriona MacCallum, and Cameron Neylon call the paper “important” and acknowledged that the authors found that impact factor “is probably the least-bad metric amongst the small set that they analyse,” but note some limitations:
The subjective assessment of research by experts has always been considered a gold standard—an approach championed by researchers and funders alike [3]–[5], despite its problems [6]. Yet a key conclusion of the study is that the scores of two assessors of the same paper are only very weakly correlated (Box 1). As Eyre-Walker and Stoletzki rightly conclude, their analysis now raises serious questions about this process and, for example, the ~£60 million investment by the UK Government into the UK Research Assessment Exercise (estimated for 2008), where the work of scientists and universities are largely judged by a panel of experts and funding allocated accordingly. Although we agree with this core conclusion and applaud the paper, we take issue with their assumption of “merit” and their subsequent argument that the IF (or any other journal metric) is the best surrogate we currently have.
We have, as Retraction Watch readers may recall, extolled the virtues of post-publication peer review before.