, 1 min read
Fast exact integer divisions using floating-point operations (ARM edition)
In my latest post, I explained how you could accelerate 32-bit integer divisions by transforming them into 64-bit floating-point divisions. Indeed, 64-bit floating-point numbers can represent accurately all 32-bit integers on most processors.
It is a strange result: Intel processors seem to do a lot better with floating-point divisions than integer divisions.
Recall the numbers that I got for the throughput of division operations:
64-bit integer division | 25 cycles |
---|---|
32-bit integer division (compile-time constant) | 2+ cycles |
32-bit integer division | 8 cycles |
32-bit integer division via 64-bit float | 4 cycles |
I decided to run the same test on a 64-bit ARM processor (AMD A1100):
64-bit integer division | 7 ns |
---|---|
32-bit integer division (compile-time constant) | 2 ns |
32-bit integer division | 6 ns |
32-bit integer division via 64-bit float | 18 ns |
These numbers are rough, my benchmark is naive (see code). Still, on this particular ARM processor, 64-bit floating-point divisions are not faster (in throughput) than 32-bit integer divisions. So ARM processors differ from Intel x64 processors quite a bit in this respect.