“Overall this one test does establish that checking for overflows can be expensive. ”
That’s a load-bearing “can” right here. So the gcc people chose not to optimize overflow checking to emit something sensible, therefore overflow checking is too slow, therefore we are stuck with the sad state of affairs.
In practice, we are likely to see tasks far more complex than adding an array of integers, so the actual overhead is likely to be in the single digit percents.
Rust chose to omit checks in release mode because a) it’s not memory unsafe to have overflows (though the results can also be dire depending on context), and they in my opinion correctly assumed that they need to reach perf parity with C and C++, or else they wouldn’t be seen as an alternative.
At least Rust does bounds-checking by default (which can often be optimized out by using iterators instead of indexing, or by asserting the length by taking a slice).
therefore overflow checking is too slow, therefore we are stuck with
the sad state of affairs.
The other side of the coin is that the first step in fixing a performance problem is to document it. GCC could multiply its performance.
Thomas Neumannsays:
It is interesting to note that without -ftrapv clang uses SIMD instructions, while with -ftrapv clang uses a scalar addition plus jo instruction. Which probably causes most of the performance difference, the overflow check itself is not that expensive, but it prevents SIMD usage.
Your demo program computes the sum over an array, which allows for auto-vectorization. In programs where the compiler cannot vectorize computations the overhead is probably smaller. When I add -fno-vectorize to your demo program the runtime difference with and without -ftrapv is 43% on an AMD 1950X.
John Smithsays:
You write the check yourself with __builtin_add_overflow.
Peter Dimovsays:
-fsanitize=undefined is a better way to trap integer overflow nowadays.
Dmitrii Loginovsays:
Did you check __builtin_add_overflow or similar functions?
JFTR: what LLVM ships in their compiler-rt is equally bad. It’s a real shame, for both of them!
Robert Ruedisuelisays:
It would be nice to be able to turn this on in specific variables, operations or blocks of code that are security sensitive, as well as adjust behavior based on what the variable does, including falling back on an alternate algorithm that won’t overflow in such cases.
For the real horror show perform multiplication instead of addition: on an AMD EPYC 7262, GCC shows a 14x slowdown, beaten by Clang with a whopping 537x slowdown!
This is just one of the many
highly tuned implementations of the low-level code generator support routines
which LLVM braggs about on their web pages.
You gotta love such blatant lies!
“Overall this one test does establish that checking for overflows can be expensive. ”
That’s a load-bearing “can” right here. So the gcc people chose not to optimize overflow checking to emit something sensible, therefore overflow checking is too slow, therefore we are stuck with the sad state of affairs.
In practice, we are likely to see tasks far more complex than adding an array of integers, so the actual overhead is likely to be in the single digit percents.
Rust chose to omit checks in release mode because a) it’s not memory unsafe to have overflows (though the results can also be dire depending on context), and they in my opinion correctly assumed that they need to reach perf parity with C and C++, or else they wouldn’t be seen as an alternative.
At least Rust does bounds-checking by default (which can often be optimized out by using iterators instead of indexing, or by asserting the length by taking a slice).
The other side of the coin is that the first step in fixing a performance problem is to document it. GCC could multiply its performance.
It is interesting to note that without -ftrapv clang uses SIMD instructions, while with -ftrapv clang uses a scalar addition plus jo instruction. Which probably causes most of the performance difference, the overflow check itself is not that expensive, but it prevents SIMD usage.
Your demo program computes the sum over an array, which allows for auto-vectorization. In programs where the compiler cannot vectorize computations the overhead is probably smaller. When I add -fno-vectorize to your demo program the runtime difference with and without -ftrapv is 43% on an AMD 1950X.
You write the check yourself with __builtin_add_overflow.
-fsanitize=undefined is a better way to trap integer overflow nowadays.
Did you check __builtin_add_overflow or similar functions?
https://gcc.godbolt.org/z/bsTxPv
GCC implements -frtapv like replacement ‘+’ to ‘addvsi3’ function https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libgcc/libgcc2.c#L87
Clang implements it by using ‘setno’. Same as __builtin_add_overflow. https://gcc.godbolt.org/z/eqxYxc
That means clang crash program by ‘UD2’ instruction, but GCC call ‘abort’. Which behaviour is more desirable to you?
Better dare to take a look at the code of the addv?i3() functions shipped in libgcc.a: it’s outright HORRIBLE!
lea rax, [rdi, rsi]
test rsi, rsi
js .negative
cmp rdi, rax
jg .somewhere
...
.negative:
cmp rdi, rax
jl .elsewhere
....
JFTR: what LLVM ships in their compiler-rt is equally bad. It’s a real shame, for both of them!
It would be nice to be able to turn this on in specific variables, operations or blocks of code that are security sensitive, as well as adjust behavior based on what the variable does, including falling back on an alternate algorithm that won’t overflow in such cases.
The portable-snippets library has portable wrappers for many useful operations, including overflow-checking arithmetic. They use the relevant compiler intrinsics where possible.
https://github.com/nemequ/portable-snippets/tree/master/safe-math
For the real horror show perform multiplication instead of addition: on an AMD EPYC 7262, GCC shows a 14x slowdown, beaten by Clang with a whopping 537x slowdown!
This is just one of the many
which LLVM braggs about on their web pages.
You gotta love such blatant lies!