I would hope that anyone with a passing familiarity with floating point numbers and numerical issues would not find this surprising!
As a rule of thumb, I assume that any transformation which changes the order of operations, or combines or splits operations into new ones, may have a different result as the original formulation.
I’m curious about the counter-examples to this rule. I can think of only a few trivial transformations that can work, like a - b = a + (-b) because in that case the magnitude of everything is the same after the transformation, but for pretty much anything interesting it seems, naively, like the rule would apply.
I would hope that anyone with a passing familiarity with floating point numbers and numerical issues would not find this surprising!
Indeed, but it could be true (that x/y = x * (1/y)), in C/C++ if you use the “-ffast-math” flag on some compilers. When examining outputs from godbolt, I find that compilers do not seem to make use of the optimizations described in Reynold’s post. However, if you use the “-ffast-math” flag, they do turn the division into a multiplication, but without any apparent correction factor.
Travis Downssays:
Yes, because that’s what basically what-ffast-math means: perform transformations on floating point “expressions” that do not necessarily preserve the exact C/C++-rules-following IEEE754 results of the literal expression in the source.
This doesn’t always mean a worse result either: sometimes the result of -ffast-math is closer to the true result than the version that follows the literal source (e.g., when using an FMA to replace discrete multiply and addition, since it preserves precision in the intermediate value).
I see what you are saying though: -ffast-math may perform a transformation on what appears to be a single operation, transforming it to a reciprocal multiplication and giving a different result. So it’s not just multi-operation expressions that might be eligible for this type of transformation.
Indeed, but it could be true
I’m going to admit to not understanding what the “it” is in this sentence.
The “it” was a reference to “x/y = x * (1/y) is true”. Bad English on my part.
Travis Downssays:
I see.
It ends up being true because both sides are calculating it the less precise way, not because both sides are calculating it the precise way!
Every day I count my blessings that I have don’t actually need to use floating point math in any significant capacity in my day-to-day activities :).
foobarsays:
An interesting question on the subject: what floating point values of a satisfy the following for all x:
x / a == x * (1 / a)
Only trivial solutions I can think of are 0 or of the form 2^n (n being integer, of course), because any other inverse of a would result an infinite binary expansion. But floating point arithmetic is finite in precision, so are there non-trivial variants that work correctly (at least on some rounding mode)?
foobarsays:
Quick empirical estimation suggests that no values which would avoid errors, apart from powers of two, really exist. There are constants for which rounding errors are rarer than for others, but even for the square root of two which is one of the better candidates almost 13% of results are wrong, and for the square root of three which is on the worse end around 50% of results are off.
On average, multiplication by floating point inverse seems to differ from IEEE division for about 27% of (uniformly distributed) values.
The difference identified appears to be related to how round-off is applied to the last bit for floating point numbers.
-ffast-math also modifies how round-off is applied.
The other modifier is multi-threaded calculations, which frequently has a more significant effect on round-off, not only for the last bit but for more significant precision loss in sums, often with more significant effects.
When summing floating point values, any assessment of the significance of this last-bit error needs to consider the accuracy available from the input values, which should show the available accuracy is much less than this last-bit error. Variations in the last bit are rarely a significant problem.
Travis Downssays:
The other modifier is multi-threaded calculations, which frequently
has a more significant effect on round-off, not only for the last bit
but for more significant precision loss in sums, often with more
significant effects.
I’m not quite following this. By multi-threaded I assume you mean breaking a serial computation into parts, calculating the parts and then recombining the result?
Although this leads to a different result in general it is not obvious to me that it generally leads to a less precise result.
Indeed, for basic examples like a sum of all elements in a vector, I would expect multiple accumulators needed for parallel sums to reduce error since magnitudes are smaller and split across more mantissa bits.
John Campbellsays:
I agree that in some cases summing the parts can reduce the apparent error.
When subtracting two near equal values (a-b), the accuracy is not due to the last bit adjustment, but an assessment of the available accuracy of a and b. So I often find that the maximum error estimate, based on the (storage) accuracy of each input value, can be more significant than the error due to using the inverse.
It is in sums, rather than products that accuracy needs to be reviewed.
Mesays:
The main question is how big the loss in accuracy is compared to the performance gains.
In your example, the error probably is about 1 ulp? That is usually well within tolerance.
I would hope that anyone with a passing familiarity with floating point numbers and numerical issues would not find this surprising!
As a rule of thumb, I assume that any transformation which changes the order of operations, or combines or splits operations into new ones, may have a different result as the original formulation.
I’m curious about the counter-examples to this rule. I can think of only a few trivial transformations that can work, like
a - b = a + (-b)
because in that case the magnitude of everything is the same after the transformation, but for pretty much anything interesting it seems, naively, like the rule would apply.I would hope that anyone with a passing familiarity with floating point numbers and numerical issues would not find this surprising!
Indeed, but it could be true (that x/y = x * (1/y)), in C/C++ if you use the “-ffast-math” flag on some compilers. When examining outputs from godbolt, I find that compilers do not seem to make use of the optimizations described in Reynold’s post. However, if you use the “-ffast-math” flag, they do turn the division into a multiplication, but without any apparent correction factor.
Yes, because that’s what basically what
-ffast-math
means: perform transformations on floating point “expressions” that do not necessarily preserve the exact C/C++-rules-following IEEE754 results of the literal expression in the source.This doesn’t always mean a worse result either: sometimes the result of
-ffast-math
is closer to the true result than the version that follows the literal source (e.g., when using an FMA to replace discrete multiply and addition, since it preserves precision in the intermediate value).I see what you are saying though:
-ffast-math
may perform a transformation on what appears to be a single operation, transforming it to a reciprocal multiplication and giving a different result. So it’s not just multi-operation expressions that might be eligible for this type of transformation.I’m going to admit to not understanding what the “it” is in this sentence.
The “it” was a reference to “x/y = x * (1/y) is true”. Bad English on my part.
I see.
It ends up being true because both sides are calculating it the less precise way, not because both sides are calculating it the precise way!
Every day I count my blessings that I have don’t actually need to use floating point math in any significant capacity in my day-to-day activities :).
An interesting question on the subject: what floating point values of a satisfy the following for all x:
x / a == x * (1 / a)
Only trivial solutions I can think of are 0 or of the form 2^n (n being integer, of course), because any other inverse of a would result an infinite binary expansion. But floating point arithmetic is finite in precision, so are there non-trivial variants that work correctly (at least on some rounding mode)?
Quick empirical estimation suggests that no values which would avoid errors, apart from powers of two, really exist. There are constants for which rounding errors are rarer than for others, but even for the square root of two which is one of the better candidates almost 13% of results are wrong, and for the square root of three which is on the worse end around 50% of results are off.
On average, multiplication by floating point inverse seems to differ from IEEE division for about 27% of (uniformly distributed) values.
Yeah 1/y is only exact for POT. Reference support: https://hal.inria.fr/ensl-00409366
And the approximate number of exact is also tight. Reference: http://perso.ens-lyon.fr/jean-michel.muller/fpdiv.html (in the supplement)
The difference identified appears to be related to how round-off is applied to the last bit for floating point numbers.
-ffast-math also modifies how round-off is applied.
The other modifier is multi-threaded calculations, which frequently has a more significant effect on round-off, not only for the last bit but for more significant precision loss in sums, often with more significant effects.
When summing floating point values, any assessment of the significance of this last-bit error needs to consider the accuracy available from the input values, which should show the available accuracy is much less than this last-bit error. Variations in the last bit are rarely a significant problem.
I’m not quite following this. By multi-threaded I assume you mean breaking a serial computation into parts, calculating the parts and then recombining the result?
Although this leads to a different result in general it is not obvious to me that it generally leads to a less precise result.
Indeed, for basic examples like a sum of all elements in a vector, I would expect multiple accumulators needed for parallel sums to reduce error since magnitudes are smaller and split across more mantissa bits.
I agree that in some cases summing the parts can reduce the apparent error.
When subtracting two near equal values (a-b), the accuracy is not due to the last bit adjustment, but an assessment of the available accuracy of a and b. So I often find that the maximum error estimate, based on the (storage) accuracy of each input value, can be more significant than the error due to using the inverse.
It is in sums, rather than products that accuracy needs to be reviewed.
The main question is how big the loss in accuracy is compared to the performance gains.
In your example, the error probably is about 1 ulp? That is usually well within tolerance.