, 3 min read
Fast exact integer divisions using floating-point operations (ARM edition)
4 thoughts on “Fast exact integer divisions using floating-point operations (ARM edition)”
, 3 min read
4 thoughts on “Fast exact integer divisions using floating-point operations (ARM edition)”
One important note, about UDIV/SDIV instruction on arm64 form ARMv8 ISA: ” The divide instructions do not generate a trap upon division by zero, but write zero to the destination register.”
Can you check the same on 16 bit integers and 32 bit floats? Maybe the arm processor divisor is not fast, say go through a lot of uops to get the results, but the 32 bit float is more probable to be fast.
Another caveat is that in SKX you are pushed more for a division less algorithm as you have only a double pumped 256b divisor for a 512b vector. Still no integer divisor so it’s much more fast than scalar int.
You can pull the same trick with 16-bit integers, yes. It is a good observation.
This is nice. Note that if you need signed (floor) integer division this way, you can set the FP control register to round toward -inf (
_mm_setcsr(_MM_ROUND_TOWARD_ZERO)
, orfesetround
for portability).