Daniel Lemire's blog

, 6 min read

Computing the inverse of odd integers

8 thoughts on “Computing the inverse of odd integers”

  1. Sam Kramer says:

    Interesting article!
    Those using C++14 or newer can use the constexpr specifier to evaluate the functions at compile time:

    constexpr uint64_t f64(uint64_t x, uint64_t y) {
    return y * ( 2 – y * x );
    }

    constexpr uint64_t findInverse64(uint64_t x) {
    uint64_t y = x;
    y = f64(x,y);
    y = f64(x,y);
    y = f64(x,y);
    y = f64(x,y);
    y = f64(x,y);
    return y;
    }

    1. Great!

      Does this work with C++11?

      1. Sam Kramer says:

        C++11 Solution:

        constexpr uint64_t rec_f64(uint64_t x, uint64_t y, int n) {
        return (n > 0) ? rec_f64(x, f64(x, y), n-1) : f64(x, y);
        }

        constexpr uint64_t findInverse64(uint64_t x) {
        return rec_f64(x, x, 5);
        }

        See the following blog post to learn why this is necessary:
        http://fendrich.se/blog/2012/11/22/compile-time-loops-in-c-plus-plus-11-with-trampolines-and-exponential-recursion/

  2. Brian Kessler says:

    In terms of applications, this came up for me recently when looking at an algorithm for exact division (remainder known to be zero).

    You can actually do a little better and get 5 correct starting bits with y = (3 * x) ^ 2 and in that case you also only need 4 iterations for the 64 bit inverse.

    That’s according to “personal communication from Montgomery” from this paper: https://arxiv.org/pdf/1303.0328.pdf

  3. Albert Chan says:

    For older machine, this may be faster to calculate 64-bits inverse:

    static uint32_t inverse32(uint32_t a)
    {
    uint32_t x = (3*a) ^ 2;
    x *= 2 - a * x;
    x *= 2 - a * x;
    x *= 2 - a * x;
    return x;
    }

    static uint64_t inverse64(uint64_t a) // (a x) mod 2^64 = 1
    {
    uint32_t terms, a0 = a; // a = a1<<32 | a0
    uint32_t x0 = inverse32(a0); // a0 x0 = 1
    terms = (uint32_t) (a >> 32) * x0; // term a1 x0
    terms += ((uint64_t) a0 * x0) >> 32; // term a0 x0 / 2^32
    uint32_t x1 = x0 * -terms; // a0 x1 + terms = 0
    return (uint64_t) x1 << 32 | x0; // x = a^-1 mod 2^64
    }

  4. Albert Chan says:

    Found an old post with formula that trebled bits accuracy per step.
    For inverse32(), only 2 steps are needed.

    https://groups.google.com/forum/#!msg/sci.crypt/UI-UMbUnYGk/hX2-wQVyE3oJ

    uint32_t inverse32(uint32_t a)
    {
    uint32_t t, x = (3*a) ^ 2;
    t = a * x - 1;
    x *= t * t - t + 1;
    t = a * x - 1;
    x *= t * t - t + 1;
    return x;
    }

  5. LCR says:

    I might be late to the party, but here’s a solution to remove some more cycles by hacking an 8bit approximation with a small table (especially efficient if you have the bextr instruction).

    uint64_t inverse64(uint64_t a) {
    uint8_t t[] = {
    2, 174, 210, 190, 66, 174, 210, 254,
    2, 46, 82, 190, 66, 46, 82, 254,
    };
    uint64_t x;
    x = t[(a>>1)&15] - a; // 8bit precision
    x *= 2 - a * x; // 16
    x *= 2 - a * x; // 32
    x *= 2 - a * x; // 64
    return x;
    }

    1. Here’s a solution to remove some more cycles by hacking an 8bit approximation with a small table

      It is interesting, but I doubt that it will be much faster.