17th November 2017, 3 min read

Fast exact integer divisions using floating-point operations (ARM edition)

Cyril says:

November 18, 2017 at 11:52 am

One important note, about UDIV/SDIV instruction on arm64 form ARMv8 ISA: ” The divide instructions do not generate a trap upon division by zero, but write zero to the destination register.”
eden segal says:

December 5, 2017 at 10:46 am

Can you check the same on 16 bit integers and 32 bit floats? Maybe the arm processor divisor is not fast, say go through a lot of uops to get the results, but the 32 bit float is more probable to be fast.
Another caveat is that in SKX you are pushed more for a division less algorithm as you have only a double pumped 256b divisor for a 512b vector. Still no integer divisor so it’s much more fast than scalar int.
1. Daniel Lemire says:
  
  December 5, 2017 at 8:50 pm
  
  You can pull the same trick with 16-bit integers, yes. It is a good observation.
Timothy Herchen says:

May 8, 2023 at 8:16 pm

This is nice. Note that if you need signed (floor) integer division this way, you can set the FP control register to round toward -inf (_mm_setcsr(_MM_ROUND_TOWARD_ZERO), or fesetround for portability).