Daniel Lemire's blog

, 9 min read

Citogenesis in science and the importance of real problems

13 thoughts on “Citogenesis in science and the importance of real problems”

  1. May be this is an instance of what you’re saying:

    http://stackoverflow.com/questions/504823/has-anyone-actually-implemented-a-fibonacci-heap-efficiently

    Theory say that the performance of Dijkstra shortest path algorithm is best when using a Fibonacci Heap, but some experiments disagree.

  2. @Alejandro

    It is a benign example of what I am pointing out. There are indeed countless engineering papers written every year using the Fibonacci heap. It is unclear *why* they use the Fibonacci heap because we now have overwhelming evidence that it is not worth it.

    However, there are many hidden Fibonacci heaps out there in research papers.

  3. Denzil Correa says:

    Is this also a BIG reason why authors don’t make available their source code publicly? I believe it is.

  4. JeffE says:

    Fibonacci heaps are still cited only because authors are too lazy to look past their textbooks to the more recent literature, where simpler, faster, and more practical data structures with the same theoretical guarantees have been known for years. (Fibonacci heaps are still cited in textbooks because _textbook_ authors are too lazy to look past _their_ textbooks to the more recent literature, where etc.)

  5. Steve says:

    The point where things completely come full circle is when the engineers come out with an open source package/library (e.g., pyX) that implements several of the techniques, including X, X+, and X++, making it easy for researchers to try them all.

  6. James says:

    From my personal experience, it’s the lack of experience and exposure to better / more advanced techniques that means that these better methods get left on the shelf.
    The cognitive effort required to apply, let alone come up with, is prohibitive to their adoption, and yet the majority of developers I know would rather work things out from first principals and use inappropriate levels of abstraction, more for macho purposes than practical ones.
    my 2c. 😉

  7. Jens Teubner says:

    It is even worse. Suppose I find that X+ is no better (or even worse) than X and thus mention X+ negatively in my paper. Very quickly, my mentioning is going to be reduced to “a citation” and increase X+’s citation count. Even with my negative result, I’ll actually help promote X+!

  8. This sounds analogous to how software bloat can accumulate: http://en.wikipedia.org/wiki/Software_bloat

  9. @Jens

    True. Good point.

  10. Jo Vermeulen says:

    Doesn’t this all come down to the need for more replication of findings in CS research, and the related problem of actually valuing this type of article?

    Last year, there was a panel at the CHI conference on the topic of replication in the Human-Computer Interaction field.

    See also: Graduate Student Perspectives on replication

  11. Derek Jones says:

    There are whole subfields that are essentially fake,, and others that have long ceased to be going anywhere, e.g., mutation testing.

    It’s not surprising that industry laughs academic research in software engineering.

  12. me says:

    It is even worse.
    Negative results are not only harder to publish, but they will also not receive citations. One could argue that we could and should publish negative results at least on arXiv. Or make a journal dedicated to negative results.
    But citations tend to be even more important than the number of papers you published. So even if we would publish negative results, they will not advance your career; and we can make better use of the time spent for the write-up.
    What we really need to do is publish the source codes of failed reproductions, by contributing them to tools, so others can more easily see that some algorithm does not work as promised.

  13. David Fetter says:

    No tiny part of the obsession with novelty is the political economy.

    Things that are “novel” are ones that be made into property and thus can most easily have economic rents extracted from them under our current system, so that’s where funding goes, and where prestige goes. Where there are incentives, we should not feign surprise at the fact that people respond to same and optimize their outlook and their conduct in a way that they have reason to believe maximizes their advantage, given those incentives.