Daniel Lemire's blog

, 3 min read

Fast exact integer divisions using floating-point operations (ARM edition)

4 thoughts on “Fast exact integer divisions using floating-point operations (ARM edition)”

  1. Cyril says:

    One important note, about UDIV/SDIV instruction on arm64 form ARMv8 ISA: ” The divide instructions do not generate a trap upon division by zero, but write zero to the destination register.”

  2. eden segal says:

    Can you check the same on 16 bit integers and 32 bit floats? Maybe the arm processor divisor is not fast, say go through a lot of uops to get the results, but the 32 bit float is more probable to be fast.
    Another caveat is that in SKX you are pushed more for a division less algorithm as you have only a double pumped 256b divisor for a 512b vector. Still no integer divisor so it’s much more fast than scalar int.

    1. You can pull the same trick with 16-bit integers, yes. It is a good observation.

  3. Timothy Herchen says:

    This is nice. Note that if you need signed (floor) integer division this way, you can set the FP control register to round toward -inf (_mm_setcsr(_MM_ROUND_TOWARD_ZERO), or fesetround for portability).