I have to agree with you Daniel. Even in very prestigious conferences, you find papers whose results are pretty difficult to reproduce. In the las ICML there was a paper for Markov Processes which took a year to simulate in a large server. I’m not questioning their integrity, but who on earth is going to divert those kind of resources to check whether it is right or not, even if they provided the sw, claiming that you need a year and a cluster to get those results clearly dampens the fact checking.
Aosays:
I think you may be generalizing based upon your particular area. Setting aside CS theory in which obviously proofs are required for publication (yes, sometimes bugs in proofs are missed, but they are often followed up on), we can look at several experimental areas where work is checked. Security research is a good example of this. Many experimental security papers (such as those at usenix security) offer attacks against papers appearing the previous year. This is exactly the self-correcting approach we should see. In some other CS disciplines where designs are more important than specific experimental results, the experimental results are there only to sanity check that the design doesn’t blow up when implemented.
Itmansays:
Leon, machine learning is especially difficult. Quite often, it is not possible to reproduce, because what works on one data set, fails on another miserably.
Daniel, I think that CS is pretty good. In my experience, results are generally reproducible. Yet, codes is a big problem. It is not only authors who do not provide them. This is also journals that are not paying enough attention to publish source codes in the best possible way. I have an example, when a journal contractor botched source codes through repacking it from tar.gz to zip, but refused to do it properly. (Too much work and contractors are overwrought)
From an interested outsider perspective, one sees a large number of ‘results’ in the popular media, that are counter-intuitive, even if one is following the domain. Those apparent fluctuations make one deeply suspicious of the underlying rigor and motivations applied to the work…
By design, peer review doesn’t catch fraud. Why do people keep expecting it should?
It does not catch fraud, nor many of the non-obvious errors that are sure to plague our papers.
So, with you, I ask why would people take journal peer review as the standard of truth?
(…) are you seriously suggesting referees should be well versed in those statistical tests of data?
I am sure I review about 50 research papers, a few thesis, I grade countless papers… do you honestly think I will redo the statistical analysis on all this work? When I get a 300-page thesis to review, do you think I will redo all the proofs to make absolutely sure they are correct?
Since I can’t do it, I don’t expect others to do it.
Beetle B.says:
When the Jan-Hendrik Schon case erupted, people kept asking whether peer review had failed. I remember quite clearly Herbert Kroemer’s (Nobel laureate, and member of the committee that investigated him) response to the IEEE.
He said it’s not the role of peer review to catch these things (unless they’re blindly obvious). What he said could be paraphrased as:
“… what is critical is that traditional peer review does not protect against fraud. It is merely a check that the work appears superficially correct and interesting. A reviewer who would go out of his way to check whether a paper reports truthful results should not expect accolades. That is not how the game is played.”
You make it sound like a criticism, but that’s precisely what peer review is supposed to be. Referees are not supposed to repeat the experiment, nor are they supposed to spend a great deal of effort in catching fraud.
By design, peer review doesn’t catch fraud. Why do people keep expecting it should?
Now I didn’t look in the cases you mentioned – perhaps there were statistical anomalies, but are you seriously suggesting referees should be well versed in those statistical tests of data? It wouldn’t surprise me if more problems will be caused by referees misapplying statistics than the current system (i.e. more false positives than the number of frauds).
I do agree researchers should release their data and code – however those are to serve the readers of the journals, not the referees.
@Beetle, isn’t the purpose of academia to arrive at the truths in our world? If people are just dumping stuff out there with no concrete basis, isn’t that just opinion, not science?
I do not expect a world where there are no mistakes, and I know it takes time for us to gradually converge on the underlying truths, but when we set up systems to validate our inquiries, and they don’t, I have to wonder what their real purpose is?
Beetle B.says:
@Daniel,
My apologies. It seemed like you were actually criticizing peer review.
“So, with you, I ask why would people take journal peer review as the standard of truth?”
Now you’re making assumptions about me. Ask those people, not me.
@Paul
“but when we set up systems to validate our inquiries, and they don’t, I have to wonder what their real purpose is?”
Again, I have to point out, that peer review was never meant to validate an inquiry (maybe in mathematics, but not in the sciences).
Peer review is just one part of the QA process, just as the person who does the layout for the journal. You wouldn’t expect him/her to catch these issues, because it’s not their job to. Nor is it the purpose of referees to catch statistical anomalies unless they’re obvious to one not trained in statistics.
I’m not suggesting these shouldn’t be caught – just don’t put it on the referees to do so.
I think that even among mathematicians, it is understood that the primary responsibility for the correctness of proofs relies with the author. If the proof “looks correct”, nobody will blame the journal. It can be extremely hard to ensure that there is no flaw in a non-trivial proof. Certainly, some major results are checked very carefully, but mathematics journals are filled with relatively minor results most people don’t care about (check citations rate and then realize that even if you cite a paper, you may not even have read it).
When I was a young graduate student, I made the mistake to assume that any misunderstanding while reading a published proof was my fault. I now know that that journals are filled with faulty proofs.
This does not stop mathematics from moving forward because most of these proofs are non-critical, and even a faulty proof can be tremendously useful.
I have to agree with you Daniel. Even in very prestigious conferences, you find papers whose results are pretty difficult to reproduce. In the las ICML there was a paper for Markov Processes which took a year to simulate in a large server. I’m not questioning their integrity, but who on earth is going to divert those kind of resources to check whether it is right or not, even if they provided the sw, claiming that you need a year and a cluster to get those results clearly dampens the fact checking.
I think you may be generalizing based upon your particular area. Setting aside CS theory in which obviously proofs are required for publication (yes, sometimes bugs in proofs are missed, but they are often followed up on), we can look at several experimental areas where work is checked. Security research is a good example of this. Many experimental security papers (such as those at usenix security) offer attacks against papers appearing the previous year. This is exactly the self-correcting approach we should see. In some other CS disciplines where designs are more important than specific experimental results, the experimental results are there only to sanity check that the design doesn’t blow up when implemented.
Leon, machine learning is especially difficult. Quite often, it is not possible to reproduce, because what works on one data set, fails on another miserably.
Daniel, I think that CS is pretty good. In my experience, results are generally reproducible. Yet, codes is a big problem. It is not only authors who do not provide them. This is also journals that are not paying enough attention to publish source codes in the best possible way. I have an example, when a journal contractor botched source codes through repacking it from tar.gz to zip, but refused to do it properly. (Too much work and contractors are overwrought)
From an interested outsider perspective, one sees a large number of ‘results’ in the popular media, that are counter-intuitive, even if one is following the domain. Those apparent fluctuations make one deeply suspicious of the underlying rigor and motivations applied to the work…
Paul.
@Beetle
I have expressed my position on peer review in this earlier blog post: Peer review is an honor-based system.
By design, peer review doesn’t catch fraud. Why do people keep expecting it should?
It does not catch fraud, nor many of the non-obvious errors that are sure to plague our papers.
So, with you, I ask why would people take journal peer review as the standard of truth?
(…) are you seriously suggesting referees should be well versed in those statistical tests of data?
I am sure I review about 50 research papers, a few thesis, I grade countless papers… do you honestly think I will redo the statistical analysis on all this work? When I get a 300-page thesis to review, do you think I will redo all the proofs to make absolutely sure they are correct?
Since I can’t do it, I don’t expect others to do it.
When the Jan-Hendrik Schon case erupted, people kept asking whether peer review had failed. I remember quite clearly Herbert Kroemer’s (Nobel laureate, and member of the committee that investigated him) response to the IEEE.
He said it’s not the role of peer review to catch these things (unless they’re blindly obvious). What he said could be paraphrased as:
“… what is critical is that traditional peer review does not protect against fraud. It is merely a check that the work appears superficially correct and interesting. A reviewer who would go out of his way to check whether a paper reports truthful results should not expect accolades. That is not how the game is played.”
You make it sound like a criticism, but that’s precisely what peer review is supposed to be. Referees are not supposed to repeat the experiment, nor are they supposed to spend a great deal of effort in catching fraud.
By design, peer review doesn’t catch fraud. Why do people keep expecting it should?
Now I didn’t look in the cases you mentioned – perhaps there were statistical anomalies, but are you seriously suggesting referees should be well versed in those statistical tests of data? It wouldn’t surprise me if more problems will be caused by referees misapplying statistics than the current system (i.e. more false positives than the number of frauds).
I do agree researchers should release their data and code – however those are to serve the readers of the journals, not the referees.
@Beetle, isn’t the purpose of academia to arrive at the truths in our world? If people are just dumping stuff out there with no concrete basis, isn’t that just opinion, not science?
I do not expect a world where there are no mistakes, and I know it takes time for us to gradually converge on the underlying truths, but when we set up systems to validate our inquiries, and they don’t, I have to wonder what their real purpose is?
@Daniel,
My apologies. It seemed like you were actually criticizing peer review.
“So, with you, I ask why would people take journal peer review as the standard of truth?”
Now you’re making assumptions about me. Ask those people, not me.
@Paul
“but when we set up systems to validate our inquiries, and they don’t, I have to wonder what their real purpose is?”
Again, I have to point out, that peer review was never meant to validate an inquiry (maybe in mathematics, but not in the sciences).
Peer review is just one part of the QA process, just as the person who does the layout for the journal. You wouldn’t expect him/her to catch these issues, because it’s not their job to. Nor is it the purpose of referees to catch statistical anomalies unless they’re obvious to one not trained in statistics.
I’m not suggesting these shouldn’t be caught – just don’t put it on the referees to do so.
@Beetle
I think that even among mathematicians, it is understood that the primary responsibility for the correctness of proofs relies with the author. If the proof “looks correct”, nobody will blame the journal. It can be extremely hard to ensure that there is no flaw in a non-trivial proof. Certainly, some major results are checked very carefully, but mathematics journals are filled with relatively minor results most people don’t care about (check citations rate and then realize that even if you cite a paper, you may not even have read it).
When I was a young graduate student, I made the mistake to assume that any misunderstanding while reading a published proof was my fault. I now know that that journals are filled with faulty proofs.
This does not stop mathematics from moving forward because most of these proofs are non-critical, and even a faulty proof can be tremendously useful.
Even flawed science can be useful.
At least for CS, we should all sign the Science Code Manifesto http://sciencecodemanifesto.org/