23rd September 2020, 8 min read

How expensive is integer-overflow trapping in C++?

11 thoughts on “How expensive is integer-overflow trapping in C++?”

Andre Bogus says:

September 24, 2020 at 7:25 am

“Overall this one test does establish that checking for overflows can be expensive. ”

That’s a load-bearing “can” right here. So the gcc people chose not to optimize overflow checking to emit something sensible, therefore overflow checking is too slow, therefore we are stuck with the sad state of affairs.

In practice, we are likely to see tasks far more complex than adding an array of integers, so the actual overhead is likely to be in the single digit percents.

Rust chose to omit checks in release mode because a) it’s not memory unsafe to have overflows (though the results can also be dire depending on context), and they in my opinion correctly assumed that they need to reach perf parity with C and C++, or else they wouldn’t be seen as an alternative.

At least Rust does bounds-checking by default (which can often be optimized out by using iterators instead of indexing, or by asserting the length by taking a slice).
1. Daniel Lemire says:
  
  September 24, 2020 at 1:57 pm
  
  therefore overflow checking is too slow, therefore we are stuck with
  the sad state of affairs.
  
  The other side of the coin is that the first step in fixing a performance problem is to document it. GCC could multiply its performance.
Thomas Neumann says:

September 24, 2020 at 8:22 am

It is interesting to note that without -ftrapv clang uses SIMD instructions, while with -ftrapv clang uses a scalar addition plus jo instruction. Which probably causes most of the performance difference, the overflow check itself is not that expensive, but it prevents SIMD usage.

Your demo program computes the sum over an array, which allows for auto-vectorization. In programs where the compiler cannot vectorize computations the overhead is probably smaller. When I add -fno-vectorize to your demo program the runtime difference with and without -ftrapv is 43% on an AMD 1950X.
John Smith says:

September 24, 2020 at 8:30 am

You write the check yourself with __builtin_add_overflow.
Peter Dimov says:

September 24, 2020 at 11:56 am

-fsanitize=undefined is a better way to trap integer overflow nowadays.
Dmitrii Loginov says:

September 24, 2020 at 12:12 pm

Did you check __builtin_add_overflow or similar functions?
Dmitrii Loginov says:

September 24, 2020 at 12:28 pm

https://gcc.godbolt.org/z/bsTxPv

GCC implements -frtapv like replacement ‘+’ to ‘addvsi3’ function https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libgcc/libgcc2.c#L87

Clang implements it by using ‘setno’. Same as __builtin_add_overflow. https://gcc.godbolt.org/z/eqxYxc

That means clang crash program by ‘UD2’ instruction, but GCC call ‘abort’. Which behaviour is more desirable to you?
1. Stefan Kanthak says:
  
  September 24, 2020 at 7:15 pm
  
  Better dare to take a look at the code of the addv?i3() functions shipped in libgcc.a: it’s outright HORRIBLE!
  
  lea rax, [rdi, rsi] test rsi, rsi js .negative cmp rdi, rax jg .somewhere ... .negative: cmp rdi, rax jl .elsewhere ....
  
  JFTR: what LLVM ships in their compiler-rt is equally bad. It’s a real shame, for both of them!
Robert Ruedisueli says:

September 24, 2020 at 2:06 pm

It would be nice to be able to turn this on in specific variables, operations or blocks of code that are security sensitive, as well as adjust behavior based on what the variable does, including falling back on an alternate algorithm that won’t overflow in such cases.
Antoine says:

October 16, 2020 at 3:11 pm

The portable-snippets library has portable wrappers for many useful operations, including overflow-checking arithmetic. They use the relevant compiler intrinsics where possible.
https://github.com/nemequ/portable-snippets/tree/master/safe-math
Stefan Kanthak says:

October 19, 2020 at 8:32 am

For the real horror show perform multiplication instead of addition: on an AMD EPYC 7262, GCC shows a 14x slowdown, beaten by Clang with a whopping 537x slowdown!
This is just one of the many

highly tuned implementations of the low-level code generator support routines

which LLVM braggs about on their web pages.
You gotta love such blatant lies!