26th June 2020, 53 min read

GNU GCC on x86 does not round floating-point divisions to the nearest value

60 thoughts on “GNU GCC on x86 does not round floating-point divisions to the nearest value”

Brian Kessler says:

June 27, 2020 at 3:20 am

I think you should be more specific and state that the issue is with the x87 floating point unit. The x87 has one internal precision that defaults to 80-bits, though this can be set to 32 or 64 bits using the FLDCW instruction to load the control word for the floating point unit. For most cases, people want and expect the internal precision to equal the operand precision (FLT_EVAL_METHOD=0), which is what SSE instructions do. Note that explicitly requesting SSE does actually work in gcc as long as the processor supports SSE. See here in godbolt https://godbolt.org/z/Xqi_S7

For your docker test, you are using the i386/ubuntu image and so you need to explicitly tell gcc that you have a machine with sse instructions, e.g. -march=pentium4, which gives the desired behavior:

compiling round with -mfpmath=sse -march=pentium4 x/y = 0.501782303180000055 expected x/y (round to even) = 0.501782303180000055 Expected assumes FLT_EVAL_METHOD = 1 Equal Equal FE_DOWNWARD : 0.501782303179999944 FE_TONEAREST : 0.501782303180000055 FE_TOWARDZERO: 0.501782303179999944 FE_UPWARD : 0.501782303180000056 FLT_EVAL_METHOD = 0

The real question is why is gcc emitting any x87 instructions (except for long double) on architectures that support sse. I would expect -march=pentium4 to use sse by default, but sse has to be specifically requested.
1. Daniel Lemire says:
  
  June 27, 2020 at 2:31 pm
  
  I think you should be more specific and state that the issue is with the x87 floating point unit. The x87 has one internal precision that defaults to 80-bits, though this can be set to 32 or 64 bits using the FLDCW instruction to load the control word for the floating point unit. For most cases, people want and expect the internal precision to equal the operand precision (FLT_EVAL_METHOD=0), which is what SSE instructions do.
  
  Did you get the part where if you add the flag ‘-O2’ then FLT_EVAL_METHOD remains unchanged, but the rounding becomes correct (meaning that it rounds to the nearest 64-bit float)?
  
  For most cases, people want and expect the internal precision to equal the operand precision (FLT_EVAL_METHOD=0), which is what SSE instructions do.
  
  If the x87 unit can do it, why isn’t it the default since we agree that it is what most people want?
  1. Brian Kessler says:
    
    June 27, 2020 at 4:18 pm
    
    Looking at godbolt, gcc -O2 appears to compute the division at compile time using the precision of the operands. I don’t see any division instructions issued at all, so the FLT_EVAL_METHOD is not relevant in this case since the x87 is not actually performing the calculations. Compile time and run time floating point calculations not matching is a whole other can of worms 🙂
    
    As mentioned above, x87 unit can perform the calculations only at a single given precision by altering the precision in the control word. It could emulate FLT_EVAL_METHOD=0, but the precision would have to be updated when switching between any calculations using float, double or long double. I imagine this might not be the default behavior because of the performance overhead of loading and restoring precision in the control word for different precisions. For code that mostly or solely is using doubles, the compiler could avoid adjusting the precision most of the time at the cost of tracking the precision state throughout the code. It would be interesting to see if any compiler has a mode where it adjusts the x87 precision automatically to emulate FLT_EVAL_MODE=0
    1. Daniel Lemire says:
      
      June 27, 2020 at 4:38 pm
      
      I don’t see any division instructions issued at all, so the FLT_EVAL_METHOD is not relevant in this case since the x87 is not actually performing the calculations. Compile time and run time floating point calculations not matching is a whole other can of worms
      
      This is insanity.
      1. Jules Blok says:
        
        June 28, 2020 at 12:17 am
        
        This is insanity
        
        This is C++
    2. Shash says:
      
      June 29, 2020 at 1:33 pm
      
      Looking at godbolt, gcc -O2 appears to compute the division at compile time using the precision of the operands. I don’t see any division instructions issued at all, so the FLT_EVAL_METHOD is not relevant in this case since the x87 is not actually performing the calculations.
      
      That’s easy fixed – just move the division to a function so that gcc can’t optimize it away: https://godbolt.org/z/4Tfcup
      
      Interestingly, this gives us
      
      x/y = 0.501782303179999944
  2. That Singh says:
    
    June 27, 2020 at 6:26 pm
    
    It’s not a bug, at least not as you have related it. If the calculation gets performed at compile time (as you have said), then you have to deal with the fact that the compiler has different knowledge than the run time… And the more you tell it to optimize, the more you cause work to be done in the compiler environment.
Simon Lindholm says:

June 27, 2020 at 9:02 am

Even with SSE enabled you still don’t necessarily get complete IEEE 754 compliance, since GCC by default emits fused multiply-add instructions whenever it can:

#include <stdio.h>
double a = 0.1, b = 10.0, c = -1; int main() { double ab = a * b; printf("%.5e\n", ab + c); }

This prints 5.55112e-17 with GCC -O2 -march=native (or -mfma), and 0.00000e+00 with GCC -O2. (Can be turned off with -ffp-contract=off.)
A.J. S. says:

June 27, 2020 at 1:34 pm

If God did exist, really? I stopped reading right after that, get your stuff together or don’t write at all.
1. For Real says:
  
  June 28, 2020 at 12:01 am
  
  Oh, relax.
2. Joe Zbiciak says:
  
  June 28, 2020 at 1:13 am
  
  You must be a blast at parties.
3. Andrew Penner says:
  
  June 28, 2020 at 9:15 am
  
  Is this an attempt to dismiss the original post based on word choice as opposed to content? If you stop reading at your first offensive content you will not read much on the web 🙂
4. J. B. says:
  
  June 28, 2020 at 11:21 am
  
  Someone holds (presumably) different religious beliefs to you and that means they should stop writing and “get it together” (what does that even mean in this context – it makes no sense)? Never mind the fact that it’s totally irrelevant, I’m more interested in why you are apparently so special that you get to be judge of who can write. (Not really though – you’re a joke and so is your god.)
5. Chops says:
  
  June 28, 2020 at 12:28 pm
  
  Possibly you’re not neuro-typical and the real meaning of these words is lost on you. If you think a statement about the existence of god was made here, then you’ve completely misunderstood.
6. The Dude says:
  
  June 28, 2020 at 5:06 pm
  
  There was a god, once. But he’s dead now.
7. Mark Gomersbach says:
  
  June 28, 2020 at 6:23 pm
  
  What’s so wrong about that sentence to discard the rest of the content and respond with a weak attempt at insult?
  
  Anyway, I enjoyed reading this as everybody who first learns about this, starts losing hairs.
8. Eddy Current says:
  
  June 29, 2020 at 10:16 pm
  
  GOD is REAL, unless declared INTEGER
  
  That leaves the question: REAL4 or REAL8?
  Since neither 4 nor 8 equals 3 we can safely deny the existence of the so called trinity.
john mullee says:

June 27, 2020 at 5:12 pm

https://docs.python.org/3/tutorial/floatingpoint.html

“Python keeps the number of digits manageable by displaying a rounded value instead

>

1 / 10
0.1
Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.
Joe Zbiciak says:

June 27, 2020 at 9:06 pm

It gets weirder than that. I get different results if I compile for 32-bit vs. 64-bit:

$ gcc-7.1.0 -m64 r.c $ ./a.out 0.501782303180000055 $ gcc-7.1.0 -m32 r.c $ ./a.out 0.501782303179999944 $ gcc-9.2.0 -m64 r.c $ ./a.out 0.501782303180000055 $ gcc-9.2.0 -m32 r.c $ ./a.out 0.501782303179999944

The 32-bit SysV ABI is actually pretty terrible with regards to floating point rounding consistency.

Here’s another great example of just how messed up it is:

#include <stdio.h>
float fp_test(void) { volatile float a = (float)(1ull << 60); volatile float b = (float)(1 << 0); return a + b; }
int main() { volatile float c = (float)(1ull << 60); printf("%g\n", fp_test() - c); return 0; }

If you compile this for the 32-bit and 64-bit ABIs, you’ll get different results:

$ gcc-9.2.0 -m32 -fno-inline abi.c -o abi-32 $ gcc-9.2.0 -m64 -fno-inline abi.c -o abi-64 $ ./abi-32 1 $ ./abi-64 0

That anomaly arises from the fact that the i386 SysV ABI has floating point values returned on the stack, and they retain more precision than if they were rounded to the requested size. (Page 38: i386 SysV ABI)

And there’s this additional caveat in the i386 SysV ABI:

Looking around, it seems GCC does implement a flag, -frounding-math, that should disable the assumption of the same rounding mode everywhere. But… Just go read the associated PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678

It looks like the closest thing I’ve found to an official position is at the bottom of the page here: https://gcc.gnu.org/wiki/FloatingPointMath

The tl;dr, as I see it? “On i386 and M68K, we throw are hands in the air and do the fast thing, not the pedantically bitwise accurate thing, as that’s what we think most folks actually want.”

Note on x86 and m68080 floating-point math
For legacy x86 processors without SSE2 support, and for m68080 processors, GCC is only able to fully comply with IEEE 754 semantics for the IEEE double extended (long double) type. Operations on IEEE double precision and IEEE single precision values are performed using double extended precision. In order to have these operations rounded correctly, GCC would have to save the FPU control and status words, enable rounding to 24 or 53 mantissa bits and then restore the FPU state. This would be far too expensive.

The extra intermediate precision and range may cause flags not be set or traps not be raised. Also, for double precision, double rounding may affect the final results. Whether or not intermediate results are rounded to double precision or extended precision depends on optimizations being able to keep values in floating-point registers. The option -ffloat-store prevents GCC from storing floating-point results in registers. While this avoids the indeterministic behavior just described (at great cost), it does not prevent accuracy loss due to double rounding.

On more modern x86 processors that support SSE2, specifying the compiler options -mfpmath=sse -msse2 ensures all float and double operations are performed in SSE registers and correctly rounded. These options do not affect the ABI and should therefore be used whenever possible for predictable numerical results. For x86 targets that may not support SSE2, and for m68080 processors, use long double where exact rounding is required and explicitly convert to float or double when necessary. Using long double prevents loss of 80-bit accuracy when values must be spilled to memory. See x87note for further details.

For more information on mixing x87 and SSE math, see Math_Optimization_Flags.
Joe Landman says:

June 27, 2020 at 11:13 pm

Hmmm. I copied this to a file t.c, and complied -O3 on a an old Core i7-2670QM CPU, running linux mint 19.3

joe@lightning:~$ gcc -O3 t.c joe@lightning:~$ ./a.out x/y = 0.501782303180000055

This is gcc 7.5.0-3ubuntu~18.04. Could you describe your environment in greater detail?
1. Daniel Lemire says:
  
  June 28, 2020 at 3:44 pm
  
  I specify the docker image. I cannot be more precise than that. Have you tried running the script and generating the docker container?
  1. Joe Landman says:
    
    June 29, 2020 at 2:55 pm
    
    Ok, I missed that you were doing this with a 32 bit target. Mine is a 64 bit platform. SSE and above will behave more sanely.
Warren Henning says:

June 27, 2020 at 11:20 pm

Is it considered acceptable if the program output changes based on optimization levels? Changing -O0 to -O3 in that Godbolt link changed the program output for me: https://godbolt.org/z/rDTmn8 . I love computers!
Mathias Gaunard says:

June 28, 2020 at 12:31 am

The compiler is allowed to perform any operation with higher precision than requested.

There is apparently an interesting edge case in that your operation, when done with infinite precision, then rounded to 80-bit Intel, then rounded again to standard binary64, does not yield the same result than if done with infinite precision then directly rounded to standard binary64.
1. Al says:
  
  August 7, 2021 at 8:06 am
  
  Classic double rounding
Heikki Kultala says:

June 28, 2020 at 5:21 am

SSE does not support 64-bit double precision floating point numbers. It only supports 32-bit floating point numbers, so x87 is still used for double calculations.

Only SSE2 supports doubles.

Instead of -msse, use -msse2
Moschops says:

June 28, 2020 at 8:36 am

Is it a bug? I don’t know enough about the IEEE floating point specification, but is there a requirement to round to the nearest value rather than either of the two bounding values? Or if the requirement isn’t in that, where is it? What specifies that one of the two bounding values should be chosen over the other?
1. Mathias Gaunard says:
  
  June 28, 2020 at 5:54 pm
  
  It is not a bug, see my comment.
  It is allowed for the compiler to use arbitrarily more precision than requested. Sometimes (in some pretty specific conditions), that leads to an error of one bit in the mantissa.
Silviu says:

June 28, 2020 at 10:26 am

I think it matters how you compiled the GCC compiler itself, since those computations are done at compile time(i.e. the compiler(which is an executable compiled woth some other flags) does thrm for you).
Never bothered to check, just thought it might be relevant.
Michael K. says:

June 28, 2020 at 1:02 pm

Hi Daniel,
I wonder what the existence of God has to do with floating point properties or some compiler implementations.
Though my faith is not always as big as I wish, I do believe in God. There are people who do not believe in God. I respect this. But I think it is arrogant to deny God, because men think, we have the total understanding of the world and the universe.
I have a deep sympathy fro the ancient scholars, who addressed themselfes to theological questions. (Have you heard about Gottfried Wilhelm Leibniz, Joseph Fourier, Blaise Pascal or Michael Faraday?)
Regards,
michaeL
1. Vladimir Baus says:
  
  June 28, 2020 at 5:19 pm
  
  I cannot say for sure, but I’m pretty confident he meant no insult. To me it looks more like a saying, perhaps a bit clumsily translated from French?
  
  Back to the topic, I find it quite fascinating that there are so many interpretations to the standard, depending on how one reads it, of course. As a former member of a custom C compiler for an in-house DSP processor, I have to say my memory is quite a bit fade on the topic. Shame on me, though…
  1. Daniel Lemire says:
    
    June 28, 2020 at 5:30 pm
    
    To me it looks more like a saying, perhaps a bit clumsily translated from French?
    
    I do not think that it is a saying. It is a reference to the fact that we do not live in a universe where things work out nicely. There are ugly bits.
    1. Saheed says:
      
      June 28, 2020 at 8:05 pm
      
      Great sir, you just discovered one ugly bit (bug) in a system created by thousands of intelligent people. Permit me to point out that the very existence of God is reinforced by man’s continuos discovered of so many ugly bits we sneak into the perfect system He created, either deliberately or by mistake. Yet the system still works, even when for the most part we don’t know why.
      
      While I totally disagree with the reaction of the first person that pointed this out. The fact remains that the literary construct:
      
      If God did exist, the variable ratio would be 0.50178230318 and the story would end there. Unfortunately, there is no floating-point number that is exactly 0.50178230318.
      
      is offensive to people of faith. The logical conclusion of those two statements denies the existence of God. It is totally uncalled for in the context of the subject at hand. Judging by the target audience of the subject, it should be no surprise that those statements are interpreted as such.
      
      Your reply implies that it is a deliberate construct. I say it is just wrong and clearly inconsiderate.
      1. Moschops says:
        
        June 29, 2020 at 5:04 pm
        
        “I say it is just wrong and clearly inconsiderate.”
        
        Well it’s not. It’s a not uncommon English turn of phrase; completely forgivable for a non-native English speaker to not know it, and native English speakers who’ve missed it simply aren’t reading enough.
    2. Moschops says:
      
      June 29, 2020 at 5:03 pm
      
      Exactly so. That’s the meaning as commonly understood. It is no comment at all about religion, or the existence of god, or anything like that. It’s akin to saying “in a just world”, or “in a sensible universe”. This obsession with the literal seems to be more common amongst programmers; joyfully throwing out the pleasures of communication and language.
2. José Luis says:
  
  July 1, 2020 at 10:13 am
  
  Worst: I had heard of Newton.
3. José Luis says:
  
  July 1, 2020 at 10:32 am
  
  Michel, don’t be so sensitive (or I may use the childish lenguage? ) about a common expresion (out of your house or church), we are talking on other dimension (a well known dimension in which gods are irrelevant as Laplace said to Napoléon Bonaparte).
Marshall Ward says:

June 28, 2020 at 9:37 pm

For what it’s worth, this is what I get from GCC 10.1 with libc 2.31:

x/y = 0.501782303180000055

My machine is an AMD Ryzen 5 2600. Not yet sure how to compare the assembly output or various architecture compiler flags.

Our group has recently had to grapple with various floating point answer in changes in libm from libc 2.22 to 2.27, so it does seem some attention is being paid to these issues.
1. Marshall Ward says:
  
  June 28, 2020 at 9:49 pm
  
  Just tested this on several GCC versions (6.1, 7.3, 8.1, 8.2, 9.1) on a Sandy Bridge Xeon, and also get the “correct” answer as above.
  
  I am unfortunately not very versed in Docker, but FP repro is very important to our work and I’d like to understand this more, so I’ll try to try and familiarize with it.
2. Marshall Ward says:
  
  June 28, 2020 at 9:57 pm
  
  Using -m32 appears to give me the “wrong” answer:
  
  $ ~/test/float$ gcc -m32 div.c $ ~/test/float$ ./a.out x/y = 0.501782303179999944
  
  (I also now see Joe Zbiciak has observed the same…)
Tony Reix says:

June 29, 2020 at 3:07 pm

With GCC v9 on Fedora32/x86_64 : OK :

gcc /tmp/C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -O2 /tmp/C.c -o C ; ./C

x/y = 0.501782303180000055

gcc –version

gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

With GCC v8.4 on AIX : OK :

gcc C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

gcc –version

gcc (GCC) 8.4.0

g++ -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++ -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

Same with GCC v9.3 :

g++ -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++ -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O0 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O0 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 –version

g++-9 (GCC) 9.3.0

You should use Pari/gp !

0.1+(0.2+0.3)

%1 = 0.60000000000000000000000000000000000000

(0.1+0.2)+0.3

%2 = 0.60000000000000000000000000000000000000

50178230318.0 / 100000000000.0

%1 = 0.50178230318000000000000000000000000000

Ha ha ha !!
Marc-Olivier Killijian says:

June 29, 2020 at 3:25 pm

Hi Daniel, just to let you know that gcc 8.3.0 on Mac (target x86_64-apple-darwin18) works correctly and gives “x/y = 0.501782303180000055”.
A. Banker says:

June 29, 2020 at 6:32 pm

The compiler chose to do banker’s rounding.
You think the right answer is this:

1100001010111011110101000011101111000010101110111101010000111011

but it is bounded below by this:

1100001010111011110101000011101011000010101110111101010000111010

Round to even will round to the lower value if it ends in 0.
zero says:

June 30, 2020 at 3:45 am

Did you try compiling with -mpc64? From the man page,

-mpc32
-mpc64
-mpc80
Set 80387 floating-point precision to 32, 64 or 80 bits. When -mpc32 is specified, the significands of
results of floating-point operations are rounded to 24 bits (single precision); -mpc64 rounds the
significands of results of floating-point operations to 53 bits (double precision) and -mpc80 rounds the
significands of results of floating-point operations to 64 bits (extended double precision), which is the
default. When this option is used, floating-point operations in higher precisions are not available to
the programmer without setting the FPU control word explicitly.

Setting the rounding of floating-point operations to less than the default 80 bits can speed some programs by 2% or more. Note that some mathematical libraries assume that extended-precision (80-bit) floating- point operations are enabled by default; routines in such libraries could suffer significant loss of accuracy, typically through so-called "catastrophic cancellation", when this option is used to set the precision to less than extended precision.
zero says:

July 1, 2020 at 3:59 pm

Go to godbolt.org with the godbolt code provided in this article, and add the compilation switch -mpc64. Problem gone. It looks like the solution is (advertised) in the manual.
1. Daniel Lemire says:
  
  July 1, 2020 at 7:06 pm
  
  @zero
  
  Problem gone. It looks like the solution is (advertised) in the manual.
  
  Are you sure?
  
  I am not so sure. Yes. It will solve the 64-bit case, but does it ensures that the 32-bit floats do not suffer from the same double rounding problem? If the values are rounded to 53-bit mantissa and then rounded again to a smaller mantissa, the same problem could happen again.
  
  As far as I can tell, there is only way correct way to solve the problem and it is to round once from the infinite precision computation to the desired computation, without any intermediate.
  
  And then there is the warning about catastrophic failing of some libraries if you invoke this 64-bit flag. Does it include the standard mathematics library? It might, right?
  
  As far as I can tell, the only proper fix is the SSE approach. But it is not enabled by default.
zero says:

July 2, 2020 at 1:50 am

Hi, did you find an example where operations with variables of type ‘float’ suffer from rounding due to performing the operation with data type ‘double’ before being rounded down to type ‘float’ again?

One could also look at why -mpcXX is there. Looking at the relevant Intel manuals, one will find notes about the precision control bits. At first glance, it looks like GCC might be simply parroting what the hardware is doing (this is also the argument why the GCC bug you referenced was marked as invalid — and note the same bug’s comments suggest -mpc64 and setting the default path to SSE rather than then x87, as per the GCC documentation). Did you look at the actual Intel manuals on this?

By the way, is the difference noted here within one ulp of the most precise result? Is that within the expected error for the operation in question? Is there an actual bug here? Of course, when exact values are required in compiled binaries, one could always specify the actual floating point value in binary (e.g. with a union).

Regarding the other libraries that might be affected, one would have to read the relevant documentation to see if the libraries one wants to use are compiled (binary libraries) or can be compiled (from sources) to a level of precision that is compatible with one’s intent. This is just good practice.

Finally, I am not celebrating this level of complexity…
Daniel Lemire says:

July 2, 2020 at 2:17 am

Hi, did you find an example where operations with variables of type ‘float’ suffer from rounding due to performing the operation with data type ‘double’ before being rounded down to type ‘float’ again?

Well, the compiler is providing us no guarantee, right? How lucky do you feel?

I could investigate, but I have already ruled out this mode of GCC as unacceptable.

One could also look at why -mpcXX is there. Looking at the relevant Intel manuals, one will find notes about the precision control bits. At first glance, it looks like GCC might be simply parroting what the hardware is doing (this is also the argument why the GCC bug you referenced was marked as invalid — and note the same bug’s comments suggest -mpc64 and setting the default path to SSE rather than then x87, as per the GCC documentation). Did you look at the actual Intel manuals on this?

I don’t think Intel can be blamed.

For example, evidently the compiler is itself, at compile-time, evaluating the expression with full precision. The net result is that if you do “double x = value1; double y = value2; if(x/y = value1/value2) ..”, the expression will evaluate to false under some optimization levels but not others.

The fact that the expressions evaluate to different values at compile-time than at runtime has nothing to do with Intel.

By the way, is the difference noted here within one ulp of the most precise result? Is that within the expected error for the operation in question? Is there an actual bug here? Of course, when exact values are required in compiled binaries, one could always specify the actual floating point value in binary (e.g. with a union).

At compile-time, the evaluation is exact. It is only at runtime that it differs.

Regarding the other libraries that might be affected, one would have to read the relevant documentation to see if the libraries one wants to use are compiled (binary libraries) or can be compiled (from sources) to a level of precision that is compatible with one’s intent. This is just good practice.

The GNU math library publishes its ULP constraints, and you have (often) at most 1 ULP as a promise. Unfortunately, I found that this was optimistic: it does not apply when x87 is active… it can go much higher than stated in the documentation. Now, what happens when you set the rounding to 64, from what we can tell, is likely to be worse.

I have not investigated the matter more, and I do not plan to. It is not a viable platform as far as I am concerned. Not in 2020.
1. Wilco says:
  
  July 3, 2020 at 5:24 pm
  
  I don’t think Intel can be blamed.
  
  It is an x87 problem, you will not get this on other ISAs. There are other ISAs that also support 80-bit long double, but they have instructions that produce float or double results (for example 68k has fsmul, fdmul).
  
  For example, evidently the compiler is itself, at compile-time,
  evaluating the expression with full precision. The net result is that
  if you do “double x = value1; double y = value2; if(x/y =
  value1/value2) ..”, the expression will evaluate to false under some
  optimization levels but not others.
  
  That can only happen on x87. Floating point expressions will give the same results at compile-time and at run-time by default.
  
  The fact that the expressions evaluate to different values at
  compile-time than at runtime has nothing to do with Intel.
  
  A user can explicitly switch on aggressive FP optimizations, change the rounding mode or use flush-to-zero. This will change the result at runtime, and that is expected. However on x87 results are different even if you don’t do any of this. And given all the bug reports, that is not obvious nor expected!
zero says:

July 2, 2020 at 2:25 am

(note SSE is enabled by default for 64 bits — enabling this default for 32 bits now would break conceivably every program that relied on default settings… not worth that chaos, especially when you can control the behavior with a switch)
Wilco says:

July 2, 2020 at 8:34 pm

It’s worth updating the title – this is a x86-specific issue. Other ISAs like Arm, Power, MIPS etc do not have this issue – they fully support IEEE and give consistent answers.

It’s well-known among floating point experts that the x87 floating point unit has many serious IEEE conformance issues. Let’s ignore the transcendental functions which are not only very slow but also incredibly inaccurate.

Even when switched to the pc32/pc64 modes it is not possible to calculate floating point values correctly. This is because x87 always uses 15 exponent bits (it is still using the 80-bit format internally), so you still get double rounding results. Even more strangely, the value may change when a register is stored to memory – only then the extra exponent range is reduced to the correct IEEE value!

So if you use a value just computed in a register you may get a different result if it was stored to memory. A compiler may spill variables at any time. You can use the same variable several times from a register but if at some point it is stored/spilled, its value may suddenly change…

So an expression like (x > y && x > y) can evaluate to false even if the first x > y is true! Unbelievable, right?

To avoid this you need to use -ffloat-store to force register values to memory so that the extra rounding happens immediately and the value is now consistent. This option slows down x87 floating point code significantly. So you get to choose between fast and inconsistent, or consistently incorrectly rounded and slow…

In conclusion: if you care about floating point at all, do not ever use x87.
1. Daniel Lemire says:
  
  July 2, 2020 at 8:45 pm
  
  Thanks for this informative post Wilco.
  1. Wilco says:
    
    July 3, 2020 at 1:09 pm
    
    You’re welcome. I learned much of this the hard way… I had the pleasure of debugging an instruction set simulator which used native floating point instructions to simulate IEEE floating point hardware. It worked fine for the developers (IIRC they used SPARC workstations), but it failed when we ran our IEEE testsuite on Windows. The developers were adamant the simulator was correct since our failing testcases passed on their system.
    
    After some time we convinced them to switch to a software floating point library on x86 since otherwise we couldn’t ever trust the simulator to correctly emulate IEEE floating point (you can’t test all inputs to be sure). For similar reasons modern compilers use software floating point to evaluate floating point constants.
    1. Daniel Lemire says:
      
      July 3, 2020 at 9:27 pm
      
      @Wilco
      
      You don’t get the same problems by default with LLVM clang on x86. That’s what I mean by “Intel cannot be blamed”.
      
      Can we blame Intel for producing a bad x87 design? Sure. But we are talking about something done in the 1980s… by people who probably retired a long time ago.
      
      I would rather blame the people today who insists that we must keep on defaulting on x87 because otherwise… too many things might break.
      
      I don’t understand this attitude. When old things are bad, it is you duty to leave them behind and move on.
      
      Compilers should say “we no longer support x87, it is shit” and be done with it. If people want to keep supporting it, let them add crazy flags to their command lines.
      1. Wilco says:
        
        July 4, 2020 at 12:01 pm
        
        Yes I completely agree that the -fpmath default for 32-bit x86 should have been changed years ago in GCC. LLVM uses a different default, but unlike GCC it has no need to be “bug compatible” with many applications…
        
        In general GCC maintainers tend to be very conservative. Change is being seen as risky, breaking software or creating unnecessary extra work. I managed to make various improvements to GCC defaults in the last few years, so there is progress, but unfortunately it is slow.
        
        It would be useful to create requests to change the -fpmath default – if there is enough demand, maybe it will finally happen. Note the x86 backend maintainers (which include Intel employees of course) would have the final say.
        
        Wilco says:
        
        July 5, 2020 at 11:24 am
        
        Btw you only need -mfpmath=sse, no need for a -msse2 or a -march flag, even for GCC5: https://godbolt.org/z/T26rRk
        
        Daniel Lemire says:
        
        July 5, 2020 at 5:18 pm
        
        @Wilco Please run my docker script.
Jamie says:

October 14, 2021 at 1:43 am

You can solve this properly with GCC’s fpu_control.h as described at:

http://christian-seiler.de/projekte/fpmath/

#include

fpu_control_t fpu_oldcw, fpu_cw;
_FPU_GETCW(fpu_oldcw); // store old cw
fpu_cw = (fpu_oldcw & ~_FPU_EXTENDED & ~_FPU_DOUBLE & ~_FPU_SINGLE) | precision;
_FPU_SETCW(fpu_cw);
// calculations here
_FPU_SETCW(fpu_oldcw); // restore old cw

Precision may be one of:

_FPU_SINGLE
Single precision
_FPU_DOUBLE
Double precision
_FPU_EXTENDED
Double-extended precision
1. Daniel Lemire says:
  
  October 14, 2021 at 1:54 am
  
  Looks like a great comment.
Lydia Marie Williamson says:

December 13, 2021 at 5:19 am

The ALGLIB library ran into similar issues with floating point rouding, early on, and took the easy (but awkward) way out: by defining functions for each of the comparison operators. In my version (over on GitHub under AlgLib_cpp), I removed the functions and put the operators back in, and then made a note of the problem, and its resolution in the manual (Manual.htm, sections 3.1 and 5.5) – the problem being specific to the Intel FPU and to later processors that include it.

It required setting the FPU up to round to IEEE double floating point (fldcw 0x27f or _FPU_SETCW(0x27f) under GCC using the macro _FPU_SETCW from . It appeared that it was fixed later in GCC, though I don’t know which version the repair took place in.

So, in relation to to the issue you brought up, there is some history behind this issue showing that there’s been trail and error trying to get the rounding issues resolved.