Daniel Lemire's blog

, 2 min read

Innovative ideas are indistinguishable from crackpot ones

It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as quality of the presentation, length, logical correctness, prestige of the authors, and methodology, but not on the significance of the work. Significance cannot be disproved at the time of the review. Even technical details end up being fundamental ideas: this happens frequently in mathematics where lemmas often outshine theorems on the long term.

I review several research papers every month, and several research funding proposals every year. At best, I can determine that something is badly presented. I can find logical or mathematical errors. Beyond this, my opinion is probably often wrong.

Here are a few things I would have or I have categorized as crackpot ideas:

  • Back in 1990, I would have predicted that the WWW was impractical. How can you deal efficiently with broken links? Who is going to maintain all these links? Yet, it works. I almost never encounter a 404 (missing page) error.
  • Back in 1991, I would have laughed had anyone that you can efficiently index and categorize over 8 billion dynamic Web pages, much of which appears and disappears frequently. Yet Google, Yahoo and many other search engines are able to index daily the content of my posts. They differentiate my content from webspam. They also determine the authority of my page. Yet, there is no central registry, no form of quality control, and so on. While they use technically sophisticated techniques, much of it works simply by brute force: keep revisiting and reindexing the sites you expect to change.
  • Not long ago, I had concluded that Twitter was a useless idea. Months later, I realize that Twitter offers ambient collaboration. I believe it caters to an essential need that had gone mostly unnoticed previously. (If you are not on Twitter, you ought to be.)
  • The first time I read about bitmap indexes, I thought it was a limited clever technical trick with little scientific interest. (I just published two papers on bitmap indexes and I have more on the way!)
  • Jim Gray’s data cube idea is to work with a lattice of 2d cuboids. Since, in data warehouses, we often have d large (d>15), the materialization of even a small fraction of these cuboids is impractical. Yet, it has been very fruitful both in industry and in academia.

Fortunately, if you merely discard the papers that omit to follow my guidelines, you already discard quite a number! Requiring papers to be without logical flaws and well written is often quite harsh!

Anyhow, there must be some link to evolution theory. I am sure that there has been new species which presented initially little interest, but ended up being of crucial importance.

For an entertaining take on this problem, see: Simone Santini, We are sorry to inform you…, IEEE Computer, December 2005.