Further, Jeff Ulmann complained about people using easy test sets that show their algorithm working, while, in the real word this would not happen. Some, even cheat.

The same problem is with Big-O notation. Computer scientists derive new estimates using clever mathematical tricks. The problem is that these tricks introduce such enormous constants, so that (I believe) a majority of Big-O published formulas are totally bogus.

They are mathematically correct, but you would never get a sufficiently large n.

I recently optimised an algorithm that was part of a scientist’s MATLAB program. I described how I did it to a computer scientist who poured scorn on my work ‘Trival! It changes nothing in the big-O sense’

The scientist, who’s workflow was now many times faster, bought me a case of wine.

I think another way to say this is that for real world problems the constant matters. If you really want to know what is faster you need to work harder and get a sharp bound. Big-O just tells you how the bound will scale with size. However, if the difference is between O(log(n)) and O(n), then the ratio of constants must be pretty big for log(n) not to win for any appreciable n.

So, if I may paraphrase, big O-notation is a mathematically very useful notion that allows us to talk about algorithms very reasonably, and often corresponds pretty directly to reality. But you have to consider it with a grain of salt — hidden constant factors, the fact that it’s asymptotic, it doesn’t (by itself, necessarily) take into account parallelism/caches/other hardware issues all are things to worry about.

Great. My lecture in my algorithms class has all bases covered.

Michael Mitzenmachersays:

(Oh, and I forgot — it’s worst case analysis, not average/typical/whatever your case is analysis. That’s in my lecture too.)

I would hope that everything in this blog post is standard knowledge and uncontroversial.

Michael Mitzenmachersays:

While I’d agree that the limitations of the O notation you describe should be uncontroversial (and should be clearly taught with the notation), I think you exaggerate when you say:
“However, when designing a software system, the fraction of your decisions that rely on big-O analysis is small. Good engineers rely on more sophisticated models and metrics.”
and
“The big-O notation is far more limited in its applications.”
In my experience, a great many basic decisions stem from understanding what your code is doing at the level of counting operations — is it linear, quadratic, n log n, etc. — and then fine-tuning for constant factors.

So while I agree with all of your caveats, I think of O-notation as not being useful as still an exceptional rather than common case.

Asymptotic analysis tends to be more relevant with polynomial complexities than with (poly-)logarithmic ones. After all, log n â‰ˆ 30 is usually a reasonable estimate, while cache misses cost from tens to hundreds of CPU cycles.

Jouni, as far as I remember (let anybody correct me if I am wrong), quadratic time algorithm for suffix-tree(array) construction are faster in practice than linear ones.

Itman: A few years ago the fastest suffix array construction algorithms (at least under some assumptions) were indeed O(n^2 log n)-time. These days the fastest well-known implementation (under similar assumptions) is Yuta Mori’s libdivsufsort, which uses an O(n log n)-time algorithm.

The caveat is that the benchmarks are usually done with small inputs of up to ~100 MB in size. With such small inputs, memory locality and various heuristic tricks play more important role than in the multi-gigabyte range.

Itman: The quadratic complexities of those fast algorithms are basically rough upper bounds. A reasonably straightforward implementation of naive suffix sorting takes O(nL log n) time, where L is the average LCP value. The fast “quadratic time” algorithms basically improve upon this in two ways:

1) They identify a subset of suffixes, whose sorted order can be used to induce the order of the rest of the suffixes. This typically improves the running time by a constant factor.

2) They use some heuristics to identify long repetitions, and sort them in some other way. There are known bad cases for some of the heuristics, but even then the asymptotic complexity does not grow over the naive O(nL log n). For other heuristics, neither bad cases nor better time bounds are known.

Now consider a snapshot of the English language Wikipedia. In the first tens of megabytes, the average LCP is in low tens. When we increase dataset size to hundreds of megabytes or even a few gigabytes, L increases into the 50-100 range. After that, L starts to rise, reaching the 500-1000 range with dataset size in tens of gigabytes.

Then consider the human genome. Due to the long runs of Ns (unknown bases), the average LCP in the entire reference genome is in hundreds of thousands. Yet this is exactly the kind of repetition the heuristics in the quadratic algorithms can easily handle. If we ignore the Ns, the average LCP drops below 100 for the most of the chromosomes (I couldn’t find the numbers for the entire genome).

In both cases, asymptotic analysis gives us reasons to believe that the speed of the “quadratic” algorithms should be in the same neighborhood as the linear and O(n log n)-time algorithms, at least if we consider the typical ~100 MB test cases. Yet as the Wikipedia example suggests, the situation may change with larger inputs.

Will Fitzgeraldsays:This reminds me of my post, O(n) beats O(lg n) I wrote a while ago:

http://willwhim.wpengine.com/2011/07/07/on-beats-olg-n/

Itmansays:Further, Jeff Ulmann complained about people using easy test sets that show their algorithm working, while, in the real word this would not happen. Some, even cheat.

The same problem is with Big-O notation. Computer scientists derive new estimates using clever mathematical tricks. The problem is that these tricks introduce such enormous constants, so that (I believe) a majority of Big-O published formulas are totally bogus.

They are mathematically correct, but you would never get a sufficiently large n.

Mike Crouchersays:I recently optimised an algorithm that was part of a scientist’s MATLAB program. I described how I did it to a computer scientist who poured scorn on my work ‘Trival! It changes nothing in the big-O sense’

The scientist, who’s workflow was now many times faster, bought me a case of wine.

Carson Chowsays:I think another way to say this is that for real world problems the constant matters. If you really want to know what is faster you need to work harder and get a sharp bound. Big-O just tells you how the bound will scale with size. However, if the difference is between O(log(n)) and O(n), then the ratio of constants must be pretty big for log(n) not to win for any appreciable n.

Itmansays:@Carson Chow. The problem is that it is not always log(n) vs n. Quite often it is log^3(n) or, say, log^5(n). (have a real example, of course).

John Regehrsays:Mike, you need to find a computer scientist to talk to who isn’t an idiot.

Mark C. Wilsonsays:In addition to this point, the smaller but important point is that big-O gives only an upper bound, asymptotically, and not necessarily a tight one. Use big-Omega for that. See http://mcw.wordpress.fos.auckland.ac.nz/2011/09/27/big-omicron-and-big-omega-and-big-theta/

Michael Mitzenmachersays:So, if I may paraphrase, big O-notation is a mathematically very useful notion that allows us to talk about algorithms very reasonably, and often corresponds pretty directly to reality. But you have to consider it with a grain of salt — hidden constant factors, the fact that it’s asymptotic, it doesn’t (by itself, necessarily) take into account parallelism/caches/other hardware issues all are things to worry about.

Great. My lecture in my algorithms class has all bases covered.

Michael Mitzenmachersays:(Oh, and I forgot — it’s worst case analysis, not average/typical/whatever your case is analysis. That’s in my lecture too.)

Daniel Lemiresays:@Michael

I would hope that everything in this blog post is standard knowledge and uncontroversial.

Michael Mitzenmachersays:While I’d agree that the limitations of the O notation you describe should be uncontroversial (and should be clearly taught with the notation), I think you exaggerate when you say:

“However, when designing a software system, the fraction of your decisions that rely on big-O analysis is small. Good engineers rely on more sophisticated models and metrics.”

and

“The big-O notation is far more limited in its applications.”

In my experience, a great many basic decisions stem from understanding what your code is doing at the level of counting operations — is it linear, quadratic, n log n, etc. — and then fine-tuning for constant factors.

So while I agree with all of your caveats, I think of O-notation as not being useful as still an exceptional rather than common case.

Jounisays:Asymptotic analysis tends to be more relevant with polynomial complexities than with (poly-)logarithmic ones. After all, log n â‰ˆ 30 is usually a reasonable estimate, while cache misses cost from tens to hundreds of CPU cycles.

Itmansays:Jouni, as far as I remember (let anybody correct me if I am wrong), quadratic time algorithm for suffix-tree(array) construction are faster in practice than linear ones.

Jounisays:Itman: A few years ago the fastest suffix array construction algorithms (at least under some assumptions) were indeed O(n^2 log n)-time. These days the fastest well-known implementation (under similar assumptions) is Yuta Mori’s libdivsufsort, which uses an O(n log n)-time algorithm.

The caveat is that the benchmarks are usually done with small inputs of up to ~100 MB in size. With such small inputs, memory locality and various heuristic tricks play more important role than in the multi-gigabyte range.

Itmansays:Jouni,

Thank you for the update. Yes, this may be the case. But, still, up to a “small” 100 MB input, the quadratic beats the classic LINEAR.

Jounisays:Itman: The quadratic complexities of those fast algorithms are basically rough upper bounds. A reasonably straightforward implementation of naive suffix sorting takes O(nL log n) time, where L is the average LCP value. The fast “quadratic time” algorithms basically improve upon this in two ways:

1) They identify a subset of suffixes, whose sorted order can be used to induce the order of the rest of the suffixes. This typically improves the running time by a constant factor.

2) They use some heuristics to identify long repetitions, and sort them in some other way. There are known bad cases for some of the heuristics, but even then the asymptotic complexity does not grow over the naive O(nL log n). For other heuristics, neither bad cases nor better time bounds are known.

Now consider a snapshot of the English language Wikipedia. In the first tens of megabytes, the average LCP is in low tens. When we increase dataset size to hundreds of megabytes or even a few gigabytes, L increases into the 50-100 range. After that, L starts to rise, reaching the 500-1000 range with dataset size in tens of gigabytes.

Then consider the human genome. Due to the long runs of Ns (unknown bases), the average LCP in the entire reference genome is in hundreds of thousands. Yet this is exactly the kind of repetition the heuristics in the quadratic algorithms can easily handle. If we ignore the Ns, the average LCP drops below 100 for the most of the chromosomes (I couldn’t find the numbers for the entire genome).

In both cases, asymptotic analysis gives us reasons to believe that the speed of the “quadratic” algorithms should be in the same neighborhood as the linear and O(n log n)-time algorithms, at least if we consider the typical ~100 MB test cases. Yet as the Wikipedia example suggests, the situation may change with larger inputs.