18th September 2017, 6 min read

Computing the inverse of odd integers

8 thoughts on “Computing the inverse of odd integers”

Sam Kramer says:

September 19, 2017 at 5:24 pm

Interesting article!
Those using C++14 or newer can use the constexpr specifier to evaluate the functions at compile time:

constexpr uint64_t f64(uint64_t x, uint64_t y) {
return y * ( 2 – y * x );
}

constexpr uint64_t findInverse64(uint64_t x) {
uint64_t y = x;
y = f64(x,y);
y = f64(x,y);
y = f64(x,y);
y = f64(x,y);
y = f64(x,y);
return y;
}
1. Daniel Lemire says:
  
  September 19, 2017 at 5:34 pm
  
  Great!
  
  Does this work with C++11?
  1. Sam Kramer says:
    
    September 19, 2017 at 6:56 pm
    
    C++11 Solution:
    
    constexpr uint64_t rec_f64(uint64_t x, uint64_t y, int n) {
    return (n > 0) ? rec_f64(x, f64(x, y), n-1) : f64(x, y);
    }
    
    constexpr uint64_t findInverse64(uint64_t x) {
    return rec_f64(x, x, 5);
    }
    
    See the following blog post to learn why this is necessary:
    http://fendrich.se/blog/2012/11/22/compile-time-loops-in-c-plus-plus-11-with-trampolines-and-exponential-recursion/
Brian Kessler says:

September 25, 2017 at 10:18 pm

In terms of applications, this came up for me recently when looking at an algorithm for exact division (remainder known to be zero).

You can actually do a little better and get 5 correct starting bits with y = (3 * x) ^ 2 and in that case you also only need 4 iterations for the 64 bit inverse.

That’s according to “personal communication from Montgomery” from this paper: https://arxiv.org/pdf/1303.0328.pdf
Albert Chan says:

June 3, 2018 at 9:51 pm

For older machine, this may be faster to calculate 64-bits inverse:

static uint32_t inverse32(uint32_t a) { uint32_t x = (3*a) ^ 2; x *= 2 - a * x; x *= 2 - a * x; x *= 2 - a * x; return x; }
static uint64_t inverse64(uint64_t a) // (a x) mod 2^64 = 1 { uint32_t terms, a0 = a; // a = a1<<32 | a0 uint32_t x0 = inverse32(a0); // a0 x0 = 1 terms = (uint32_t) (a >> 32) * x0; // term a1 x0 terms += ((uint64_t) a0 * x0) >> 32; // term a0 x0 / 2^32 uint32_t x1 = x0 * -terms; // a0 x1 + terms = 0 return (uint64_t) x1 << 32 | x0; // x = a^-1 mod 2^64 }
Albert Chan says:

June 4, 2018 at 3:27 am

Found an old post with formula that trebled bits accuracy per step.
For inverse32(), only 2 steps are needed.

https://groups.google.com/forum/#!msg/sci.crypt/UI-UMbUnYGk/hX2-wQVyE3oJ

uint32_t inverse32(uint32_t a) { uint32_t t, x = (3*a) ^ 2; t = a * x - 1; x *= t * t - t + 1; t = a * x - 1; x *= t * t - t + 1; return x; }
LCR says:

July 20, 2018 at 4:31 pm

I might be late to the party, but here’s a solution to remove some more cycles by hacking an 8bit approximation with a small table (especially efficient if you have the bextr instruction).

uint64_t inverse64(uint64_t a) { uint8_t t[] = { 2, 174, 210, 190, 66, 174, 210, 254, 2, 46, 82, 190, 66, 46, 82, 254, }; uint64_t x; x = t[(a>>1)&15] - a; // 8bit precision x *= 2 - a * x; // 16 x *= 2 - a * x; // 32 x *= 2 - a * x; // 64 return x; }
1. Daniel Lemire says:
  
  July 20, 2018 at 6:42 pm
  
  Here’s a solution to remove some more cycles by hacking an 8bit approximation with a small table
  
  It is interesting, but I doubt that it will be much faster.