It should be possible to calculate the tipping point when the authors algorithm outperforms gmplib without implementing & running the algorithm. Just don’t ignore all constants and lower order terms in the cost function. It would still be an approximation, so your remark remains valid, however it would make clear if the algorithm optimization still holds on a realistic computer system.
it would make clear if the algorithm optimization still holds on a realistic computer system
Importantly, in the breakthrough paper, the authors make no such claim as far as I can tell.
I think you could make reasonable estimates if you are sufficiently careful, and it could be a scientific claim (i.e., that can be invalidated). Knuth has done such work, for example. It is uncommon.
foobarsays:
I think the primary reason they don’t even demonstrate their algorithm is – if I understood it correctly – that the algorithm doesn’t actually differ in running behavior from previous best theoretical run time complexity algorithms for reasonably sized inputs. Here being outside “reasonable” means that tipping point where differences start to show may require more storage than the whole observable universe could provide for a classical computer, plainly as a requirement for inputs that would show the difference. Surely this is well beyond any real-life machinery we could think for practical benchmarking…
Bensays:
You seem a bit too harsh on purely mathematical algorithms research. Perhaps their algorithm isn’t generally better-in-practice, but their analysis opens the door to someone else making further progress later. Of course I do wonder if much of the research going on now will ever lead anywhere interesting. But in principle there can be value in foundational theoretical work.
It implies there exists a point where it will have a better speed past a certain value of n.
It does not.
It would be an embarrassment to be known as a scientist or an
engineer.
People who look down on engineering should not make engineering claims.
Wilcosays:
Good post, and fully agreed! Unfortunately this issue is prevalent even among software engineers. All too often people focus on theoretical worst-case behaviour without any regard for performance in the real world (eg. replacing Quicksort with a much slower sort).
In my experience a good implementation of an algorithm with worst case of O(N * N) can seriously outperform a guaranteed O(N) algorithm (see eg. [1]). For the strstr case the constant factors in the O-notation vary by about about 2 orders of magnitude, and triggering the worst cases requires special randomized inputs which maximize the number of branch mispredictions in each implementation.
So yes, claiming one algorithm is better than another solely based on theoretical complexity is completely missing the point. Another crazy assumption is that using O(N) temporary storage is just for “free”. Many string search algorithms require a pointer for every input character, ie. 8N bytes for N input bytes, making them completely useless in the real world.
Please pause to consider: if you are a scientist, you need to make predictions that apply to our universe otherwise you are doing mathematics.
I think that’s the issue and it’s one of semantics. Theoretical CS has always had one foot firmly in the “pure math” camp. Based on my non-exhaustive Google search, it seems that a lot of people are in the “math is not science” camp because it lacks empirical verification or experiments involving natural phenomena. So therefore the “S” in “CS” becomes problematic for you.
Really, who cares? At the level of the original paper and the claims of the author, no one is being fooled. Everyone in the know understands the difference between these theoretical bounds and real-world performance. Yes, it’s math result, not a “applied CS” result. A lot of other important results in CS qualify as “math”. What do you want the others to do? Refuse to publish their (IMO, very important) result because it doesn’t satisfy your “this universe, empirical” requirement?
So I think you can only really pick a bone with the terminology in the Quanta article. Frankly, this is an article for lay people, even if Quanta still goes deeper than say “USA Today”. Yeah, it is very remiss for not mentioning that this technique is totally infeasible. I don’t think the bar is very high for this kind of article and honestly a lot of it is good. They could have been more honest by walking back some of the claims and including a good disclaimer about practical speeds, but I feel your attack here is broader than called for.
This problem of misusing asymptotic efficiency often takes place in the area of data structures.
We tend as humans to simplify very complex topics into a simple number or description for comparison, and then we ignore the actual meaning of the simplification when we do a comparison.
In the topic of data structures, big O notation usually means number of operations. However this becomes unrealistic for state of the art memory hierarchies, which is why, for example, cache oblivious algorithms use number of memory accesses for asymptotic efficiency instead.
I think the point of the article is not to speak bad about the publication per se, but rather the media that claims existing multiply implementations are slower. The fact that they assume the ability to handle big enough numbers does not mean existing computers will handle it properly.
It reminds me about the claims about quantum computers and their superior ability to tackle any problem than traditional computers.
It should be possible to calculate the tipping point when the authors algorithm outperforms gmplib without implementing & running the algorithm. Just don’t ignore all constants and lower order terms in the cost function. It would still be an approximation, so your remark remains valid, however it would make clear if the algorithm optimization still holds on a realistic computer system.
it would make clear if the algorithm optimization still holds on a realistic computer system
Importantly, in the breakthrough paper, the authors make no such claim as far as I can tell.
I think you could make reasonable estimates if you are sufficiently careful, and it could be a scientific claim (i.e., that can be invalidated). Knuth has done such work, for example. It is uncommon.
I think the primary reason they don’t even demonstrate their algorithm is – if I understood it correctly – that the algorithm doesn’t actually differ in running behavior from previous best theoretical run time complexity algorithms for reasonably sized inputs. Here being outside “reasonable” means that tipping point where differences start to show may require more storage than the whole observable universe could provide for a classical computer, plainly as a requirement for inputs that would show the difference. Surely this is well beyond any real-life machinery we could think for practical benchmarking…
You seem a bit too harsh on purely mathematical algorithms research. Perhaps their algorithm isn’t generally better-in-practice, but their analysis opens the door to someone else making further progress later. Of course I do wonder if much of the research going on now will ever lead anywhere interesting. But in principle there can be value in foundational theoretical work.
My second last paragraph makes this point.
It implies there exists a point where it will have a better speed past a certain value of n.
Good. It would be an embarrassment to be known as a scientist or an engineer.
It does not.
People who look down on engineering should not make engineering claims.
Good post, and fully agreed! Unfortunately this issue is prevalent even among software engineers. All too often people focus on theoretical worst-case behaviour without any regard for performance in the real world (eg. replacing Quicksort with a much slower sort).
In my experience a good implementation of an algorithm with worst case of O(N * N) can seriously outperform a guaranteed O(N) algorithm (see eg. [1]). For the strstr case the constant factors in the O-notation vary by about about 2 orders of magnitude, and triggering the worst cases requires special randomized inputs which maximize the number of branch mispredictions in each implementation.
So yes, claiming one algorithm is better than another solely based on theoretical complexity is completely missing the point. Another crazy assumption is that using O(N) temporary storage is just for “free”. Many string search algorithms require a pointer for every input character, ie. 8N bytes for N input bytes, making them completely useless in the real world.
[1] https://github.com/bminor/glibc/blob/master/string/strstr.c
I think that’s the issue and it’s one of semantics. Theoretical CS has always had one foot firmly in the “pure math” camp. Based on my non-exhaustive Google search, it seems that a lot of people are in the “math is not science” camp because it lacks empirical verification or experiments involving natural phenomena. So therefore the “S” in “CS” becomes problematic for you.
Really, who cares? At the level of the original paper and the claims of the author, no one is being fooled. Everyone in the know understands the difference between these theoretical bounds and real-world performance. Yes, it’s math result, not a “applied CS” result. A lot of other important results in CS qualify as “math”. What do you want the others to do? Refuse to publish their (IMO, very important) result because it doesn’t satisfy your “this universe, empirical” requirement?
So I think you can only really pick a bone with the terminology in the Quanta article. Frankly, this is an article for lay people, even if Quanta still goes deeper than say “USA Today”. Yeah, it is very remiss for not mentioning that this technique is totally infeasible. I don’t think the bar is very high for this kind of article and honestly a lot of it is good. They could have been more honest by walking back some of the claims and including a good disclaimer about practical speeds, but I feel your attack here is broader than called for.
Everyone who builds systems does. The authors of the cited paper almost surely do.
Admittedly, I haven’t read the paper (maybe link it?), but do the others claim anything other than a pure mathematical result?
That is, do they make the kind of “faster and applicable in the real world” claims that you object to, or do those only appear in the magazine?
They do not. I would have criticized them.
This problem of misusing asymptotic efficiency often takes place in the area of data structures.
We tend as humans to simplify very complex topics into a simple number or description for comparison, and then we ignore the actual meaning of the simplification when we do a comparison.
In the topic of data structures, big O notation usually means number of operations. However this becomes unrealistic for state of the art memory hierarchies, which is why, for example, cache oblivious algorithms use number of memory accesses for asymptotic efficiency instead.
I think the point of the article is not to speak bad about the publication per se, but rather the media that claims existing multiply implementations are slower. The fact that they assume the ability to handle big enough numbers does not mean existing computers will handle it properly.
It reminds me about the claims about quantum computers and their superior ability to tackle any problem than traditional computers.