, 2 min read
Ideal divisors: when a division compiles down to just a multiplication
The division instruction is one of the most expensive instruction in your CPU. Thus optimizing compilers often compile divisions by known constants down to a multiplication followed by a shift. However, in some lucky cases, the compiler does not even need a shift. I call the corresponding divisors ideal. For the math. geeks, they are related to Fermat numbers.
For 32-bit unsigned integers, we have two such divisors (641 and 6700417). For 64-bit unsigned integers, we have two different ones (274177 and 67280421310721). They are factors for 232 + 1 and 264 + 1 respectively. They are prime numbers.
So you have that
n/274177 = ( n * 67280421310721 ) >> 64
and
n/67280421310721 = ( n * 274177 ) >> 64.
In these expressions, the multiplication is the full multiplication (to a 128-bit result). It looks like there is still a ‘shift’ by 64 bits, but the ‘shift’ disappears in practice after compilation.
Of course, not all compilers may be able to pull this trick, but many do. Here is the assembly code produced by GCC when compiling n/274177 and n/67280421310721 respectively for an x64 target.
movabs rdx, 67280421310721
mov rax, rdi
mul rdx
mov rax, rdx
ret
mov rax, rdi
mov edx, 274177
mul rdx
mov rax, rdx
ret
You get similar results with ARM. It looks like ARM works hard to build the constant, but it is mostly a distraction again.
mov x1, 53505
movk x1, 0xf19c, lsl 16
movk x1, 0x3d30, lsl 32
umulh x0, x0, x1
ret
mov x1, 12033
movk x1, 0x4, lsl 16
umulh x0, x0, x1
ret
What about remainders?
What a good compiler will do is to first compute the quotient, and then do a multiplication and a subtraction to derive the remainder. It is the general strategy. Thus, maybe surprisingly, it is more expensive to compute a remainder than a quotient in many cases!
You can do a bit better in some cases. There is a trick from our Faster Remainder by Direct Computation paper that compilers do not know about. You can compute the remainder directly, using exactly two multiplications (and a few move instructions):
n % 274177 = (uint64_t( n * 67280421310721 ) * 274177) >> 64
and
n % 67280421310721 = (uint64_t( n * 274177 ) * 67280421310721) >> 64.
In other words, the following two C++ functions are strictly equivalent:
// computes n % 274177
uint64_t div1(uint64_t n) {
return n % 274177;
}
// computes n % 274177
uint64_t div2(uint64_t n) {
return (uint64_t( n * 67280421310721 )
* __uint128_t(274177)) >> 64;
}
Though the second function is more verbose and uglier, it will typically compile to more efficient code involving just two multiplications, back to back. It may seem a lot but it is likely better than what the compiler will do.
In any case, if you are asked to pick a prime number and you expect to have to divide by it, you might consider these ideal divisors.
Further reading. Integer Division by Constants: Optimal Bounds