Daniel Lemire's blog

, 3 min read

MIT fires associate professor for making up data

Slashdot points out this CNN article where we learn that the Massachusetts Institute of Technology fired an associate professor for falsifying research data. The fellow is named Luk Van Parijs and a quick search on Google doesn’t bring up his home page and even archive.org has no trace of the fellow. However, we get the news bite on MIT’s web site. This case seems very similar to Jan Hendrik Schön’s who was publishing one paper every 8 days and making the data as he went.

Two years ago, I’ve myself caught a Korean Ph.D. student who was publishing 15 papers a year, most of them being copies of existing papers, sometimes not his own. I caught him by using Google: I was a referee for a paper he submitted and after doing a Google search on a key phrase he used, I found out that he had paraphrased from start to finish a paper published by American authors in a Canadian workshop two years before. I reported the problem rather widely, if not publicly. Last time I checked, the student was still working toward his Ph.D. so, clearly, his school didn’ think it was a big deal. His school never got back to me, not even to acknowledge the report I sent. The student should have been dismissed from the Ph.D. program at once, if you ask me.

I suspect that fraud is far more widespread than most people suspect. When I write a paper, I get to spend about 80% collecting experimental data. If I were to make up the data, I could publish 5 times faster. The incentive to cheat is significant.

What is worse is that you are unlikely to get caught. Most papers are never read thoroughly and results are almost never reproduced. But, even so, when you catch someone, where do you report them? What do you have to gain by reporting them? There is also a grey area where the author mislead you, but you can’t quite call it fraud. In Computer Science, I found that trying reproduce results from papers I read is often a frustrating and expensive experience. Most often, you don’t have enough details to reproduce accurately the results, and when you do have enough details, you often can’t match the results reported. This is why I mostly look at the theoretical analysis: you can’t easily falsify theory, all you can do is copy it. And even when you can reproduce the experimental results, you often find out that the author cheated a bit. How do authors cheat? By conveniently forgetting to include cases where their results are not good.

I say this is cultural. Author Z reports that technique X is best. If you ever come in, implement technique X and report that, after all, it doesn’t work so well, you will never be able to publish your negative result in as a prestigious venue. In this sense, the author who put the results in the most positive light possible will always win. The same hold for grant applications: your research must be guaranteed to deliver outstanding results or you will not receive grant money. Honesty is definitively not an important value in our community at large. Pure Mathematics and Theoretical Computer Science are probably lucky exceptions.

On the positive side, if I ever catch someone at MIT of serious wrongdoings, I now know they may come down hard on the person. This is probably a serious warning to anyone working at MIT. I don’t know whether most schools would come as hard as MIT did on someone who cheated in order to get grant money. I have serious doubts especially if this person is a rising star.