Anonymous reviewing: the QWERTY of science

The journal Nature has started a three-month trial of a new peer review system. Here’s how it works: while a paper is sent out for traditional review, the authors can also choose to make it open for comments on the web. Any such comments are public and signed, and the authors can respond to them in public. Then, when making their acceptance decision, the editors take into account both the anonymous reviews and the public online discussion.

Personally, I think this is a phenomenal idea, and I hope it spreads to computer science sooner rather than later. I’ve always been struck by the contradiction between scientists’ centuries-old mistrust of secrecy — their conviction that “only mushrooms grow in the dark” — and their horror at signing their names to their opinions of each other’s work. Are we a bunch of intellectual wusses?

Inspired by Nature’s experiment, I’m going to try an experiment of my own. Rather than develop my views any further (which I don’t feel like doing), I’m just going to stop right here and open the field to comments. Go!

This entry was posted on Saturday, June 10th, 2006 at 2:00 am and is filed under Nerd Interest. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

28 Responses to “Anonymous reviewing: the QWERTY of science”

D. Eppstein Says:
Comment #1 June 10th, 2006 at 3:17 am
There are non-anonymous peer reviewing systems in other non-academic areas; I’m thinking less of blog comments (which have little further consequence) and more of for instance the systems for counting views and comments and favorites in Flickr, an online photo sharing system. Some people take these quite seriously, there are fora for posting photos within which one is only welcome if one’s photos have reached a certain threshhold of popularity, and only the most popular photos (though the site prefers to call them “interesting”) are shown on the main page.

It can work for generating useful feedback, and I’m happy enough with what I’ve seen of the system on that site. But I’ve also seen similar systems on other sites lead to less positive phenomena: circles of people who always give glowing reviews to each other’s photos regardless of actual merit, a tendency for log-rolling (good reviews granted only in exchange for good reviews of one’s own photos), and (when normalized instead of absolute scores are used) a tendency to hunt out weak victims to give poor ratings (again regardless of merit) in order to make one’s good ratings more valuable.

So I think there’s some value in anonymity, for the same reason there’s value in our election systems being anonymous: it prevents ratings from becoming a commodity to be bought or sold.
Scott Says:
Comment #2 June 10th, 2006 at 4:05 am
So I think there’s some value in anonymity, for the same reason there’s value in our election systems being anonymous: it prevents ratings from becoming a commodity to be bought or sold.

Thanks, David! I actually agree with the above statement. But if we agree that anonymous and signed reviews have complementary weaknesses, then shouldn’t we seek some balance between the two — or at least experiment with both, as Nature is now doing? Research communities are small enough, and their members value their reputations enough, that maybe Flickr-like gaming of the system wouldn’t pay — those who attempted it would be sneered at. And even if not, maybe there’s a mechanism that would ameliorate the problem (“5 out of 8 researchers found the following review helpful…”). I don’t know. The point is that these are empirical questions, and I’m uncomfortable with settling them by appeal to academic tradition.
Robin Hanson Says:
Comment #3 June 10th, 2006 at 6:58 am
There is also the issue of whether all details of the paper should be shown to referees. Perhaps some evidence should be “inadmissable,” as in legal courts. Should referees be allowed to base their decision on the gender or race of the author? On his or her affiliation? Some journals try to hide the author identity from referees to try to avoid these possibilities. Another interesting possibility is to hide the conclusions, but not the method, of the paper. I actually worry more about conclusion-bias than author-bias.
Scott Says:
Comment #4 June 10th, 2006 at 8:04 am
Hi Robin,

I completely agree that there are stronger grounds for authors being anonymous than reviewers. But could author anonymity really work, in an age when a quick Google search is likely to turn up both people who could possibly have written a given paper?

Another interesting possibility is to hide the conclusions, but not the method, of the paper.

I’ve seen plenty of papers that do exactly that. I never realized that, far from being atrociously-written, they were actually bold publishing experiments. 🙂

Anyhow, I guess hiding the conclusions would work better in experimental than mathematical fields.

“And hence, by combining Lemma 17 with Corollary 35, we can finally establish the main theorem, whose statement we are not at liberty to disclose.”
Anonymous Says:
Comment #5 June 10th, 2006 at 11:09 am
I love the Amazon.com book rating system, and would be happy if it were installed on arxiv, etc. I like
(1)possibility of anonymity
(2)asking reviewers to give both a comment and 1-5 star rating
(3)allowing readers to opine whether a review was helpful or not
(4)menu for ordering reviews (by increasing number of stars, decreasing, more than 3 stars, less than 3 stars, etc.)

Is there anything to dislike about the amazon.com system?
wolfgang Says:
Comment #6 June 10th, 2006 at 11:34 am
Scott,

you could be a pioneer and post a link to your next preprint on you blog and invite comments before it gets published.
I saw at least one guy doing this (but I forgot the location).
Robin Hanson Says:
Comment #7 June 10th, 2006 at 11:39 am
Scott, it seems to me that anonymous conclusions are feasible also in theoretical papers, though I’ve never actually tried it. Instead of leaving the entire conclusion sentence blank, you could just leave a certain field blank, or put in three possible answers. As in “We find that welfare is {increasing/decreasing/constant} in the number of firm competing for the contract.”
wolfgang Says:
Comment #8 June 10th, 2006 at 11:52 am
I think the multiple choice proposal is great. One could actually let the reviewer(s) choose one and then publish the version most acceptable to the majority of reviewers.

This could work especially well in topics such as the string theory landscape.
Anonymous Says:
Comment #9 June 10th, 2006 at 2:48 pm
Seems to me that this is already done at

http://cosmocoffee.info/viewforum.php?f=2

at least for astro-ph papers. Maybe that’s where Nature got the idea?
Anonymous Says:
Comment #10 June 10th, 2006 at 4:26 pm
I’m concerned about potential payback. Say I very rarely get upset about referee comments. Over the years there’s been perhaps two or three times that I got really ignorant comments on my papers, e.g. “results are wrong, can easily be improved”, then said reviewer works for two years on problem, fails to improve results. I must admit I had a hard time staying level headed on that one.

Scott, have you heard about the SIGMOD reviewing system? Does that one preserves anonymity?
Scott Says:
Comment #11 June 10th, 2006 at 5:22 pm
you could be a pioneer and post a link to your next preprint on you blog and invite comments before it gets published.

Indeed I did that with my last paper, and I’ll do it with the next one as well.
Scott Says:
Comment #12 June 10th, 2006 at 5:40 pm
Instead of leaving the entire conclusion sentence blank, you could just leave a certain field blank, or put in three possible answers. As in “We find that welfare is {increasing/decreasing/constant} in the number of firm competing for the contract.”

If the reviewers can’t fill in the blank, how closely could they have read the analysis? Another example might make the point clearer:

“We find that the primes (contain/do not contain) arbitrarily long arithmetic progressions.”

The problem seems inherent to me, since the whole point of a math paper is that it reveals why things couldn’t possibly have been otherwise.
Robin Hanson Says:
Comment #13 June 10th, 2006 at 7:30 pm
Scott, in a field where referees of math papers are in the habit of actually reading the proof (alas not my field), the referee process would be in two stages. In the first stage the proof would be absent and the conclusion hidden. If the paper is accepted in that first stage, then in the second stage the proof and conclusion would be given – the only reason for rejection at that stage would be if the proof did not actually demonstrate the conclusion.
Scott Says:
Comment #14 June 10th, 2006 at 7:51 pm
Robin, I suspect what’s really coming out of this interchange is a cultural difference between economics and CS theory. In econ, the basic questions are what’s being modeled, whether the assumptions are realistic, etc. — not whether the proof techniques are elegant or original. In CS theory, by contrast, the proof techniques are often (though not always) the whole point. You read the proof not only to verify correctness, but to look for originality or insight.
Anonymous Says:
Comment #15 June 11th, 2006 at 12:30 am
A better question (only partly tongue-in-cheek): if someone posts a paper, advertises it, and no one reads it, can we agree that the paper is simply not interesting to a wide enough audience and thus not worth publishing?

Of course, this might eliminate roughly 75% of published papers…
Anonymous Says:
Comment #16 June 11th, 2006 at 4:00 am
n CS theory, by contrast, the proof techniques are often (though not always) the whole point.

IMHO, far too often. This is the path to irrelevance. Computer Science is not mathematics. It derives its inspiration from practical problems with real life implications. The fomula to accept a paper should include:

– real-life relevance: how useful is the result outside theory

– theory relevance: how likely it is that either the result or the proof technique will be used elsewhere in theory

– originality: how new is the area of study, proof technique or line of attack

– difficulty: worth 1/4 of the marks, how difficult it is to reproduce the result independently
Anonymous Says:
Comment #17 June 11th, 2006 at 4:04 am
if someone posts a paper, advertises it, and no one reads it, can we agree that the paper is simply not interesting to a wide enough audience and thus not worth publishing?

This would be a really bad idea as popularity is a bad indicator of relevance. I can think of several subareas of theory that were once widely popular and nowadays are by and large deserted.
Robin Hanson Says:
Comment #18 June 11th, 2006 at 9:22 am
Wow – so you can’t tell if the proof technique is interesting from just a description of the kind of conclusion-assumption relationship that it was capable of delivering? Perhaps then an author would have to describe the kind of technique he used for the first stage evaluation. Then another valid reason for rejection would be that his proof was not of the type he claimed.
Anonymous Says:
Comment #19 June 11th, 2006 at 1:10 pm
One potential problem with blog refereeing is the self-selection of people leaving comments. In other words, instead of assigning refereeing responsibilities to a specific person, a free-for-all approach may mean that only negative responses from competitors are self-selected, for example.

In other words, we can end up with a situtation when only people with vested interests (or an axe to grind) will post, while more impartial readers may decide not to post anything.

It may also polarize the review process. You can see it on blogs and internet discussion forums – controversial topics get a lot more comments, and some people leave a lot more comments, while others may be more introverted – they may agree or disagree with the post or a paper, but keep their ideas to themselves.

So overall I am not terribly optimistic about blogging approach to peer-review. I think it opens a number of possibilities for unfair “ganging up” on a certain paper from certain groups of competitors, for example, or the other way around – authors may request their friends and colleagues to post positive reviews in attempt to influence decision of reviewer.

There are also a lot of advantages offered by anonimity of referee process, and while Nature is correct in making posters reveal their names (would those be checked, by the way?), it may prevent certain people from saying negative things or blogging altogether, at the same time encouraging more participation from people who don’t particularly care about their reputation – unaffiliated “crackpots” for example, unqualified laymen who like the sound of their own voice etc.

Let’s put it this way – let’s say you are giving a talk and there are 100 people in the audience. Let’s say the talk represents excellent science, and if you poll people afterwards, 90 out of 100 will say that this is good talk. However, there’s always a person or two who like to ask assinine questions and undermine the presenter’s credibility. Some like the sound of their voice and for some reason think that disbelieving everything somehow makes them a better scientist. I know at least couple of people like that.

A disproportionate amount of questions are asked by people like that, while more reasonable scientists may not ask anything at all. Blogging approach seems to weigh towards the “question asking people” – which could be just a few people in the audience, while neglecting to inquire what the rest 98 people in the audience think of your research.
Anonymous Says:
Comment #20 June 11th, 2006 at 1:11 pm
Robin,

Your system does exist in the following sense.

The first stage is when the author solicits comments by expositing on his proof techniques, e.g. talks, seminars, circulation of drafts for feedback, etc. The second stage is then the actual review of the paper, itself having been modified with the aid of said feedback.

A description of “the kind of conclusion-assumption relationship that [a proof technique] was capable of delivering”, without the actual availability of the proof itself, is normally referred to as a “conjecture”, a “sketch” of a purported proof, or, on a grander scale, a “program”. Papers can be written on this basis, but then the interest in these papers is then on whether and how the problems posed can be solved, and that would be the subsequent topic of future papers.

Scott’s example from prime number theory is the work of Green-Tao, who answered the question in the affirmative. Another example is Andrew Wiles’ proof, which may perhaps give a better illustration.

You may recall that Wiles announced his proof in a series of lectures in Cambridge. This would correspond to your “first stage”, and after the lectures, it was reported that the audience was apparently convinced that Wiles had laid FLT to rest. Granted, Wiles’ lectures were detailed, but even talks and humans can only go so far, in terms of details and stamina respectively.

It was only in the actual review stage, your “second stage”, that the reviewers found a huge flaw in Wiles’ work, which took many months to fix (even with a collaborator, Richard Taylor, on board), and which the audience of Wiles’ talks apparently were not able to spot.

This illustrates how subtle errors can still creep in, despite the proof “technique” looking legitimate at the gross level. In fact, apparently a “wrong” technique was used, but somehow this wasn’t spotted at the “first stage”, and illustrates how a technique can seem interesting or plausible even when it turns out to be used inappropriately.

In short, at least in mathematically-oriented papers, the focus of the very review of the paper itself is in the “second stage”: spotting the subtle errors in the proof techniques used. Certainly, gross errors will be picked out as well, if there are any, but careful authors would have worked hard to avoid them. Formalizing the “first stage” would then make less sense given this situation.
Anonymous Says:
Comment #21 June 11th, 2006 at 2:34 pm
It derives its inspiration from practical problems with real life implications. The fomula to accept a paper should include:

– real-life relevance: how useful is the result outside theory

I actually laughed out loud at reading this. You do know that this is the blog of someone who writes papers about quantum computing and oracles, right?

I think it was Karp who summed up complexity theory with “if pigs can fly, then elephants can dance”. That pretty much describes how much relevance most of it has to, well, anything.

As for the topic of this blog post, I think it remains to be seen where this goes. There are certainly a lot of problems that can arise. However, I applaud Nature (of all journals!) for actually trying something new.
Anonymous Says:
Comment #22 June 11th, 2006 at 5:48 pm
I actually laughed out loud at reading this. You do know that this is the blog of someone who writes papers about quantum computing and oracles, right?

And unless quantum computing can deliver something practical within the next five to ten years it will be as popular then as, say, PRAMs are today.
Anonymous Says:
Comment #23 June 11th, 2006 at 6:03 pm
I think it was Karp who summed up complexity theory with “if pigs can fly, then elephants can dance”.

Don’t tell me you were gullible enough to believe that, coming from the author of the Karp-Rabin string matching algorithm and the Hopcroft-Karp bipartite algorithm, both of which are extensively used in practice. The same Karp who has spent the last ten years or so publishing algorithmic results in bioinformatics conferences. You didn’t think he was serious, did you?
Anonymous Says:
Comment #24 June 11th, 2006 at 11:26 pm
You didn’t think he was serious, did you?

Of course he was. The fact that Karp has done work in both complexity theory and algorithms doesn’t mean they’re the same thing.
Anonymous Says:
Comment #25 June 11th, 2006 at 11:57 pm
I think it was Karp who summed up complexity theory with “if pigs can fly, then elephants can dance”.

Don’t tell me you were gullible enough to believe that…

You’re gullible if you think that Karp meant this as an insult to complexity theory.

The “distaste” for complexity theory in TCS is–much like the “distaste” for gay rights in some parts of this country–a symptom of people who don’t come much into contact with complexity theorists (resp, gay people). You will not see such ridiculousness in any of the top 10 CS departments in the US.
Scott Says:
Comment #26 June 12th, 2006 at 3:43 am
Wow – so you can’t tell if the proof technique is interesting from just a description of the kind of conclusion-assumption relationship that it was capable of delivering?

Definitely not. Sometimes the conclusion will look exciting, but the proof will have nothing going on — e.g., just applying a standard result from a different field. Other times the conclusion will look laughably arcane, but to prove it the authors have to introduce an amazing technique that might later revolutionize the field.
Anonymous Says:
Comment #27 June 12th, 2006 at 12:10 pm
Don’t tell me you were gullible enough to believe that

Look, I do complexity theory myself. There’s no contradiction between acknowledging that most of it has very little relevance to anything practical and wanting to work in that field. Personally I just find it fascinating, and I suspect Karp does too.

You’re pretty dumb (I normally don’t insult people outright, but you started this) if you think algorithm design automatically qualifies as complexity theory. In that sense, all CS is complexity theory. Karp’s string matching algorithm is hardly any more complexity theory than an algorithm for optimizing SQL statements or one for doing facial recognitions of middle aged men is.

The same Karp who has spent the last ten years or so publishing algorithmic results in bioinformatics conferences.

I guess you missed the “bioinformatics” part. Bioinformatics algorithms are indeed useful. Oracle QC proofs, normally not so much.
Sideline Engineer Says:
Comment #28 June 16th, 2006 at 3:20 pm
Non-anonymous review? Try it and see (on a larger scale than just one person). There’s a reason that social scientists do experiments instead of proving lemmas.

Shtetl-Optimized

Anonymous reviewing: the QWERTY of science

28 Responses to “Anonymous reviewing: the QWERTY of science”