, 11 min read
Double-blind peer review is a bad idea
When you submit a manuscript to a journal or to a conference, you do not know who reviews your manuscript. Increasingly, due to concerns with biases and homophily, journals and conferences are moving to a double-blind peer review where you have to submit your paper without disclosing your identity. There is also a competing move toward more openness where everyone’s identity is disclosed.
The intuition behind double-blind review is that it is harder to discriminate against people if you do not know their name and affiliation. Of course, editors and chairs still get to know your identity. The intuition behind open peer review is that if your reviews are published, you will be kept in check and may get punished if you are too biased. But people are concerned about their reviews or the reviews of their papers being published.
There are many undesirable biases involved in a professional setting. Of course, there are undesirable biases against some minorities and women. There are other biases as well. There are indications that the prestige of the author can be a determining factor when judging a piece of work. People generally tend to review people who are like themselves more highly. There are undesirable orthodoxy biases as well: uncommon ideas are far more difficult to defend even when the most common ideas have not been revisited lately. Conventional affiliations are more highly rated than unconventional affiliations.
Yet we should not immediately accept that hiding the identity of the author is the solution. The mere fact that we recognize a problem, and that there is some action related to the problem, does not imply that we must proceed with that action. Our tendency to do so relies on a fallacy known as the politician’s syllogism.
The Australian government, motivated by a study that claim blind auditions helped women, conducted an extensive evaluation of blind interviews and found the following:
This study assessed whether women and minorities are discriminated against in the early stages of the recruitment process for senior positions in the Australian Public Service (APS). It also tested the impact of implementing a ‘blind’ or de-identified approach to reviewing candidates. Over 2,100 public servants from 15 agencies participated in the trial. They completed an exercise in which they shortlisted applicants for a hypothetical senior role in their agency. Participants were randomly assigned to receive application materials for candidates in standard form or in de-identified form (with information about candidate gender, race and ethnicity removed). Overall, the results indicate the need for caution when moving towards ’blind’ recruitment processes in the APS, as de-identification may frustrate efforts aimed at promoting diversity.
To be clear, what they found was the reverse of what they were expecting: blinding interviews made things slightly worse for women.
Ersoy and Pate find that the current non-blind peer review process favours women:
Our results suggest that male economists at top institutions benefit the most from non-blind evaluations, followed by female economists (regardless of their institution).
They find a bias against males at non-elite institutions.
And this study that shows that blind interviews helped women get hired by orchestra? Its statistical analysis does not stand up to scrutiny. And the left-leaning New York Times has recently published an essay arguing that blind interviews make orchestra less diverse.
Clearly, we believe that we can effectively combat undesirable prejudices in hiring since most employers do not hire based on a double-blind process. PhD students submit their thesis for review without hiding their name. Nobody is advocating that research papers be published anonymously as a rule. Nobody is advocating that we stop broadcasting the name of our employers, where we got our degrees and so forth. Nobody is advocating that when we report on a research result, we hide the name of the journal… Yet if we wanted to present pure research results, that is what we would do: hide affiliations, journal names, author names.
So why would we not want to hide the identity of the researchers during peer review despite the apparent advantages?
Firstly, the evidence for the benefits of double-blind peer reviews is a set of anecdotes. Double-blind experiments can bring biases to light the same way a microscope can show you a bacteria: they are great inquiry tools, but not necessary cures. What is scientific fact is that people have biases, homophily, and that you can, up to a point, anonymize content. However, the evidence for benefits is mixed. It is not clear that it helps women, for example. Do we get more participation from people outside the major universities over time under double-blind peer review? We do not know. Major conferences that did switch to double-blind peer review, like NeurIPS, are heavily dominated by a few elite institutions with almost no outsiders.
Secondly, telling someone from a poorly known organization, from a poor or non-English country or from non-dominant gender identity that they need to hide who they are to be treated fairly is not entirely a positive message. I certainly want to live in a world where a woman can publish her work as a woman. Stressing biases without properly addressing them can render fields unattractive to those who might suffer from these biases.
Another concern is that double-blind renders open scholarship difficult. I have been posting most my papers online, prior to peer review on arXiv or others servers, sometimes years before they are even submitted. I write all my software openly, engaging freely with multiple engineers and researchers. I practice what I call open scholarship. Obviously, it means I cannot reasonably take part in double-blind venues. Making open scholarship more difficult like seems a step backward. You can argue that you can still anonymize your contributions, in a bureaucratic manner, for the few days that the review last. But such a proposal dismisses the fact that open scholarship is primarily a cultural practice founded on the idea that the research happens in free and open networks.
And what happens after the work has been accepted? When the referees are biased, why would the readers not be biased as well? What is more important, the readers or the reviewers? Do we write papers to be published or to be read? I vote for the latter without hesitation. Yet, at best, double-blind peer review might help with getting papers accepted, but it does nothing for post-publication assessment. It is almost as if we thought that the end goal of the game was to get the research published in prestigious venues. Are we all about maximizing the impact factor or do we care to produce impactful research? If you are to be consistent with your beliefs, then if you promote double-blind peer review, you should also demand that we stop cataloguing and broadcasting affiliations. At a minimum, we should downplay the names of the authors: if we include them at all, they should be at the end of the paper, in small characters. If you are consistent with your beliefs, you should never, ever, give lists of names with affiliations. It seems logically incoherent for someone from an elite institution to be arguing for double-blind peer review while visibly broadcasting their elite institution. In part, I believe that they end up with such an illogical result because they start from a fallacy, the politician’s syllogism.
The San Francisco Declaration on Research Assessment tells us: “When involved in committees making decisions about funding, hiring, tenure, or promotion, make assessments based on scientific content rather than publication metrics.” Focusing on how papers get accepted misses the point of what we want to value. Yet a direct consequence of double-blind peer review is to make highly selective paper acceptance socially and politically more sustainable.
There is no free lunch. Double-blind peer review is not without cost.
Blank reported that authors from outside academia have a lower acceptance rate under double-blind peer review presumably because reviewers, when they can, tend to give a chance to outsiders despite the fact that outsider do not conform to the field’s orthodoxy as well as insiders may. Moreover, Blank indicates that double-blind peer review is overall harsher.
This “harsh” nature has been replicated and quantified. Double-blind peer review manuscripts are less likely to be successful than single-blind peer review manuscripts.
So there are unintended consequences to double-blind peer review. Having hasher reviews and lower acceptance rates may not be a positive. A student may think: “Why continue to seek approval, when you can leave science and do something else where you’ll be appreciated?”
And is the harsh nature entirely a side-effect? The introduction of double-blind peer review is partly justified by the mission we give the reviewers: select only the very best work. Once we relax this constraint on reviewers, double-blind peer review becomes much less necessary. In some sense, double-blind peer review is a way to make socially acceptable an elitist system.
If we want, for example, to increase the representation of women, there are potentially other means that are less intrusive and more positive, like, for example, including more women in the peer review process as reviewers, editors and so forth. The same applies to other biases. For example, you should ensure that people from small colleges are represented, or from poorer or non-English countries. And what about including people who have less orthodox ideas? What about including more outsiders? What about what Stonebraker might call “consumers of the research”? Look at the most desirable conferences in computer science that have adopted double-blind peer review. How many are chaired by people from non-elite institutions? When they organize plenary talks, how many are from non-elite institutions?
At a minimum, if we want to get more constructive reviews, we should give serious consideration to the demand that pre-publication peer reviews be published. Transparency is a good, practical strategy to fight undesirable biases and get people to be more constructive. We should be mindful that blinding a process, everything else being equal, makes it less transparent. In an open system, if I give raving reviews to my friends, and harsh reviews to ideas that I hate, I risk being exposed. In a fully blinded process, I can always claim impartiality. But if everyone is blinded bureaucratically, people with unacceptable biases can maintain plausible deniability should they ever be caught.
And here is another idea. Do we need the crazy low acceptance rates? In computer science, it is common that fewer than 15% of all papers are accepted. Do we realize that the outcome is unavoidably a power hierarchy controlled by a select few who pick the winners. By accepting more papers, we would necessarily make biases in peer review less harmful. We would reduce the power of the select few. Open source journals like PLOS One have shown that you can turn peer review away from a selection of the winners to a pruning of the bad research, with good results. The argument used to be that the conference was to be held in a hotel with only so many rooms, but zoom and youtube have millions of rooms. Of course, the downside then is that hiring and promotion committees cannot simply count the number of papers at prestigious venues and they must read the papers and discuss them. It is hard work. And the candidate can no longer just offer a list of papers, they have to explain why their work matters in a way that we can understand.
I do not think that the initial submission is the right time to judge the importance of a piece of work. If you look at even the best venues, most of the accepted papers are not impactful. That’s not the authors’ fault. It is just that really impactful work is rare and unpredictable. And it often takes time before we can recognize it. And different people will value different papers. By insisting that referees can reliably select the very best work, we fail to take into account the thoroughly documented limitations of pre-publication peer review. In some sense, by making it look more objective, we make things worse. We should just acknowledge that pre-publication reviews are intrinsically limited and build the system with these limitations in mind.
Though the problems that double-blind peer review seeks to address are real and significant, double-blind peer review is itself a rather crude and pessimistic solution that has several undesirable consequences. We can do better.
(Presented at the ACM Publications Board Meeting, November 19th 2020)
Further reading: Gender and peer review
Update: I love Peer Review: Implementing a “publish, then review” model of publishing
Appendix: Some selected reactions from twitter…
I agree wholeheartedly with [@lemire](https://twitter.com/lemire?ref_src=twsrc%5Etfw). Fighting nepotism with double blind is like trying to stop a mudslide with your bare hands. It’s a law that the fuzzier the criteria to measure quality, the more success (perceived value) depends on network effects [https://t.co/gSTLeA3npL](https://t.co/gSTLeA3npL) 1/n
— Balázs Kégl (@balazskegl) November 25, 2020
This thread by [@balazskegl](https://twitter.com/balazskegl?ref_src=twsrc%5Etfw) and post by [@lemire](https://twitter.com/lemire?ref_src=twsrc%5Etfw) make some good points.
I’ve never been a big fan of double-blind reviewing. Just like there’s “security theater,” double-blind reviewing seems like “objectivity theater.” It makes people feel better without necessarily helping. https://t.co/nLAZLCCmM3
— Lev Reyzin (@lreyzin) November 25, 2020