Daniel Lemire's blog

, 5 min read

Should computer scientists run experiments?

5 thoughts on “Should computer scientists run experiments?”

  1. Anonymous says:

    Daniel — you’re not the “villain” in my story. But I disagree with (what I interpret as) your criticism of models that do not model absolutely every aspect of a computation. Would you dismiss computing the communication cost of a MapReduce algorithm, even though there are other costs involved (but communication cost typically dominates)? Do you dismiss any claim of a big-oh or big-omega bound, because no such claim can ever be falsified by a finite experiment?
    –Jeff Ullman

  2. Would you dismiss computing the communication cost of a MapReduce algorithm, even though there are other costs involved (but communication cost typically dominates)?

    No.

    Do you dismiss any claim of a big-oh or big-omega bound, because no such claim can ever be falsified by a finite experiment?

    If you want to be formal about it, no model can ever be proven wrong.

    We both know that, in practice, if you plot the performance curves and you get something that is quadratic in n… you have effectively disproved the n log n model.

    You may argue that formally, I have disproved nothing… but the point of a scientific model is to tell you something about the real world. If you constantly get quadratic curves… you can argue until you are blue in the face that it is really n log n… engineers are going to reject your performance model and use a better one.

    So falsification is not a black and white thing. Rather, in practice, it becomes increasingly harder to fit a model to the facts. Now, that’s assuming you have a scientific model. If it is not a scientific model, then you can always fit it to any fact… because facts don’t matter.

  3. JeffE says:

    Big-O notation may very well be _practically useless_, but it is certainly not _meaningless_, even in the context of a universe with a finite number of elementary particles.

    _We both know that, in practice, if you plot the performance curves and you get something that is quadratic in n… you have effectively disproved the n log n model._

    No, we don’t. We both know that in practice, if one sorting algorithm has better performance on real-world data, then it has better performance on real-world data, by definition. But that’s not a falsification of the asymptotic complexity model, because that’s not what the asymptotic complexity model describes.

    It’s not the _model’s_ fault if you’re using it wrong.

  4. @JeffE

    If you want to be formal about it, then big-O notation is a purely mathematical construct that has no bearing on reality.

    Take sorting… just artificially pad any array so that it has at least 10^10 elements. You haven’t changed the big-O notation, but you have drastically altered the real-world performance for all practical cases.

    If you insist on this interpretation, then yes, big-O notation belongs to math. departments.

    But that is not how it is used. Clearly, Jeffrey’s intent is that his big-O analysis has some bearing on real world performance in this universe.

  5. Vladimir N says:

    Model’s prediction and experimental results are separate things, but they are not unrelated. What the model says, if it’s constructed under assumptions similar to what holds for the experiment, is evidence about what the experimental results will be. And the shape of experimental results is evidence for the existence of a reasonable model that would predict it.