17th November 2017, 1 min read

Fast exact integer divisions using floating-point operations (ARM edition)

In my latest post, I explained how you could accelerate 32-bit integer divisions by transforming them into 64-bit floating-point divisions. Indeed, 64-bit floating-point numbers can represent accurately all 32-bit integers on most processors.

It is a strange result: Intel processors seem to do a lot better with floating-point divisions than integer divisions.

Recall the numbers that I got for the throughput of division operations:

64-bit integer division	25 cycles
32-bit integer division (compile-time constant)	2+ cycles
32-bit integer division	8 cycles
32-bit integer division via 64-bit float	4 cycles

I decided to run the same test on a 64-bit ARM processor (AMD A1100):

64-bit integer division	7 ns
32-bit integer division (compile-time constant)	2 ns
32-bit integer division	6 ns
32-bit integer division via 64-bit float	18 ns

These numbers are rough, my benchmark is naive (see code). Still, on this particular ARM processor, 64-bit floating-point divisions are not faster (in throughput) than 32-bit integer divisions. So ARM processors differ from Intel x64 processors quite a bit in this respect.