, 1 min read

# Fast exact integer divisions using floating-point operations (ARM edition)

In my latest post, I explained how you could accelerate 32-bit integer divisions by transforming them into 64-bit floating-point divisions. Indeed, 64-bit floating-point numbers can represent accurately all 32-bit integers on most processors.

It is a strange result: Intel processors seem to do a lot better with floating-point divisions than integer divisions.

Recall the numbers that I got for the throughput of division operations:

64-bit integer division | 25 cycles |
---|---|

32-bit integer division (compile-time constant) | 2+ cycles |

32-bit integer division | 8 cycles |

32-bit integer division via 64-bit float | 4 cycles |

I decided to run the same test on a 64-bit ARM processor (AMD A1100):

64-bit integer division | 7 ns |
---|---|

32-bit integer division (compile-time constant) | 2 ns |

32-bit integer division | 6 ns |

32-bit integer division via 64-bit float | 18 ns |

These numbers are rough, my benchmark is naive (see code). Still, on this particular ARM processor, 64-bit floating-point divisions are not faster (in throughput) than 32-bit integer divisions. So ARM processors differ from Intel x64 processors quite a bit in this respect.