Another point that is often overlooked: all your comments apply to Theta, rather than Omicron, in which case there is some actual content. “Big-O” is not at all relevant, for another reason. See
my old post.
Abdullahsays:
Well, you didn’t mention the right answer for such a question. clearly with big enough input, the algorithm with smaller big O will run faster and this seems to be the right answer wither we like it or not. still, it could be the case that we always or most of the time have input size much smaller than “big enough” in which it makes perfect sense to switch to an algorithm that run slower with big input and faster with small ones. thanks for yet another good entry.
(…) with big enough input, the algorithm with smaller big O will run faster (…)
It will run faster in some cases. And that is assuming that the computational model matches your architecture.
Very often, the big-O notation says nothing about how fast the algorithm will be in practice.
I am not saying you should use bubble sort on large arrays… I’m saying that there is a lot more to high performance computing than the big-O notation the same way there is a lot more to engine design than thermodynamics.
Abdullahsays:
I get it now and this part says it all:
The problem with the big-O notation is that it is only meant to help you think about algorithmic problems. It is not meant to serve as the basis by which you select an algorithm!
thanks for the clarification.
Alexsays:
The hyperbole surrounding worst case complexity analysis these days is really distasteful. Headlines like “The big-O notation is a teaching tool” or “O-notation considered harmful” seem to advocate that people be ignorant, even if the article text ends up amounting to: average-case is often more important.
Any good developer should understand and consider both average and worst case performance. Both have real world implications, as to things like cache and memory structure.
Big-oh notation is just an efficient way for people to express ideas like “doubling the size of the data multiplies the running time by four.” It has nothing to do with teaching nor “programmer experience.”
My title is indeed hyperbolic. However, I am not advocating that people be ignorant. I believe that the big-O notation is one of the first things anyone serious about software should learn as it will save them much pain later on.
I am, however, also saying that it is far from enough. If that’s all you know about how to assess the speed of an algorithm, you do not know enough.
Aykut Bulutsays:
Hi Lemire. I follow your blog. Being on my way to get my PhD degree, I learn a lot from your writings. Especially the ones about academic writing.
I have a comment about the following section.
“When asked why the an algorithm with better computational complexity fails to be faster, people often give the wrong answers:”
I think you mean an algorithm with a better running time. Computational complexity is defined for problems. Performance of algorithms are measured by running time. Complexity of a problem is determined by the best algorithm that solves the problem.
I think you make a valid point. I have updated my blog post.
Mehmet Suzensays:
@Aykut Bulut
” Computational complexity is defined for problems.”
Complexity can also be defined for an algorithm or a program. More precisely information content of its data structures using algorithmic information theory i.e. Kolmogorov complexity.
“Complexity of an problem is determined by the best algorithm that solves the problem.”
Instead of the best, I think you mean the shortest algorithm. Measures of complexity varies. Seth Lloyd has a list of measures:
“..doubling the data size will multiply the running time by a factor of four, in the worst case. ”
I think our ‘misconception’ come from that; Big-O notation measures asymptotic behaviour of the run time against the size of the input N i.e. very large N behaviour. So, the above statement may not be true for “small inputs”. For example if run-time complexity measured as 4*N^2 , we still call algorithms’ run time complexity as N^2. However for small N doubling the small data size MAY multiply the run time by a factor of 16!
It is an elementary concept. Sorry, 4*N^2 was not a good example and my numbers were wrong. Idea was illustrating asymptotic behaviour shown by big-O notation could be “misleading” for “smaller” input. Let’s take number of operation T(N)=4*N^2 – 2N. T'(4) must be 4 times of T'(2) if we consider as T'(N) = N^2. But if we use
T(N) instead, T(4) would be 4.666667 time of T(2). Actually you have given this in your post, but I don’t know where 10TB example comes from, usually using more data full fills the scaling. Of course within reason, 10TB is too big for single machine (for now). In my experience with N-body force calculation O(N^2) algorithm performs better compare to O(NlogN) ones up to a break-even point even 50K particles. This is a good paper explaining this:
Pringle, Gavin J. “Comparison of an O (N) and an O (N log N) N-body solver.” PPSC (1995): 337-342.
Well, there are two problems with big-Os. One is that architecture does matter. A second one is that the constant under the big-O does matter as well (in certain cases, it is a bit more complicated than a constant, but a constant is a good approximation, anyway). In fact, these considerations are interconnected.
An architecture, e.g., using a cache-friendly algorithm, or SIMD-instructions, does affect the constant. Yet, the effect is typically not so great. But it is of practical importance so that one should not rely on just asymptotic estimates.
However, it really helps to think in terms Big-O plus a constant. This is not a premature optimization and often helps a lot. Regular expressions, as you may guess, is a bad example in my opinion. Because, it is possible to make most of them efficient. We could have even avoided this problem with developers who (1) respect the theory (2) understand not only Big-O, but also a hidden constant.
Nirsays:
I’ve encountered this effect firsthand. I was aware of the ideas here (at least to some degree); for instance I know that standard implementations of quicksort often switch to insertion sort on small arrays.
I recently solved a problem in Matlab in three different ways. Two were Theta(N) solutions that involved making several passes and processing. The third involved sorting the array first, and then a comparatively simple follow up. To my surprise, the algorithm involving sorting was faster for all the array sizes that I tried (I think on the order of N = 1 million).
Built in functions like sorting are likely to be so tweaked and optimized (especially in high level languages) compared to other functions you can write, that they may beat out Theta(N) algorithms.
Another point that is often overlooked: all your comments apply to Theta, rather than Omicron, in which case there is some actual content. “Big-O” is not at all relevant, for another reason. See
my old post.
Well, you didn’t mention the right answer for such a question. clearly with big enough input, the algorithm with smaller big O will run faster and this seems to be the right answer wither we like it or not. still, it could be the case that we always or most of the time have input size much smaller than “big enough” in which it makes perfect sense to switch to an algorithm that run slower with big input and faster with small ones. thanks for yet another good entry.
@Abdullah
(…) with big enough input, the algorithm with smaller big O will run faster (…)
It will run faster in some cases. And that is assuming that the computational model matches your architecture.
Very often, the big-O notation says nothing about how fast the algorithm will be in practice.
I am not saying you should use bubble sort on large arrays… I’m saying that there is a lot more to high performance computing than the big-O notation the same way there is a lot more to engine design than thermodynamics.
I get it now and this part says it all:
The problem with the big-O notation is that it is only meant to help you think about algorithmic problems. It is not meant to serve as the basis by which you select an algorithm!
thanks for the clarification.
The hyperbole surrounding worst case complexity analysis these days is really distasteful. Headlines like “The big-O notation is a teaching tool” or “O-notation considered harmful” seem to advocate that people be ignorant, even if the article text ends up amounting to: average-case is often more important.
Any good developer should understand and consider both average and worst case performance. Both have real world implications, as to things like cache and memory structure.
Big-oh notation is just an efficient way for people to express ideas like “doubling the size of the data multiplies the running time by four.” It has nothing to do with teaching nor “programmer experience.”
@Alex
My title is indeed hyperbolic. However, I am not advocating that people be ignorant. I believe that the big-O notation is one of the first things anyone serious about software should learn as it will save them much pain later on.
I am, however, also saying that it is far from enough. If that’s all you know about how to assess the speed of an algorithm, you do not know enough.
Hi Lemire. I follow your blog. Being on my way to get my PhD degree, I learn a lot from your writings. Especially the ones about academic writing.
I have a comment about the following section.
“When asked why the an algorithm with better computational complexity fails to be faster, people often give the wrong answers:”
I think you mean an algorithm with a better running time. Computational complexity is defined for problems. Performance of algorithms are measured by running time. Complexity of a problem is determined by the best algorithm that solves the problem.
@Aykut Bulut
I think you make a valid point. I have updated my blog post.
@Aykut Bulut
” Computational complexity is defined for problems.”
Complexity can also be defined for an algorithm or a program. More precisely information content of its data structures using algorithmic information theory i.e. Kolmogorov complexity.
“Complexity of an problem is determined by the best algorithm that solves the problem.”
Instead of the best, I think you mean the shortest algorithm. Measures of complexity varies. Seth Lloyd has a list of measures:
http://web.mit.edu/esd.83/www/notebook/Complexity.PDF
“..doubling the data size will multiply the running time by a factor of four, in the worst case. ”
I think our ‘misconception’ come from that; Big-O notation measures asymptotic behaviour of the run time against the size of the input N i.e. very large N behaviour. So, the above statement may not be true for “small inputs”. For example if run-time complexity measured as 4*N^2 , we still call algorithms’ run time complexity as N^2. However for small N doubling the small data size MAY multiply the run time by a factor of 16!
@Mehmet Suzen
Are you sure of that example with 4*N^2?
@Daniel Lemire
It is an elementary concept. Sorry, 4*N^2 was not a good example and my numbers were wrong. Idea was illustrating asymptotic behaviour shown by big-O notation could be “misleading” for “smaller” input. Let’s take number of operation T(N)=4*N^2 – 2N. T'(4) must be 4 times of T'(2) if we consider as T'(N) = N^2. But if we use
T(N) instead, T(4) would be 4.666667 time of T(2). Actually you have given this in your post, but I don’t know where 10TB example comes from, usually using more data full fills the scaling. Of course within reason, 10TB is too big for single machine (for now). In my experience with N-body force calculation O(N^2) algorithm performs better compare to O(NlogN) ones up to a break-even point even 50K particles. This is a good paper explaining this:
Pringle, Gavin J. “Comparison of an O (N) and an O (N log N) N-body solver.” PPSC (1995): 337-342.
Well, there are two problems with big-Os. One is that architecture does matter. A second one is that the constant under the big-O does matter as well (in certain cases, it is a bit more complicated than a constant, but a constant is a good approximation, anyway). In fact, these considerations are interconnected.
An architecture, e.g., using a cache-friendly algorithm, or SIMD-instructions, does affect the constant. Yet, the effect is typically not so great. But it is of practical importance so that one should not rely on just asymptotic estimates.
However, it really helps to think in terms Big-O plus a constant. This is not a premature optimization and often helps a lot. Regular expressions, as you may guess, is a bad example in my opinion. Because, it is possible to make most of them efficient. We could have even avoided this problem with developers who (1) respect the theory (2) understand not only Big-O, but also a hidden constant.
I’ve encountered this effect firsthand. I was aware of the ideas here (at least to some degree); for instance I know that standard implementations of quicksort often switch to insertion sort on small arrays.
I recently solved a problem in Matlab in three different ways. Two were Theta(N) solutions that involved making several passes and processing. The third involved sorting the array first, and then a comparatively simple follow up. To my surprise, the algorithm involving sorting was faster for all the array sizes that I tried (I think on the order of N = 1 million).
Built in functions like sorting are likely to be so tweaked and optimized (especially in high level languages) compared to other functions you can write, that they may beat out Theta(N) algorithms.