I know you know this, but to clarify for others: a division may require a multiply and two shifts (case 3 in the Granlund-Montgomery-Warren algorithm) or more (case 4). Computing binomial coefficients can use a single shift because it falls into the special case of a division which is known a priori to be exact.

Also, you can eke a tiny bit more range out of fastbinomial(n,k) if you do the multiplication by f.inverse before the shift.

The largest value which must be represented internally within fastbinomial is fastbinomial(n,k) * k; if that overflows 64 bits, the result will be inaccurate. However, the overflow does not matter for the multiplication by the inverse, only the shift, so only fastbinomial(n,k) << f.shift needs to fit into 64 bits.

I was just working on a similar problem, and for my application I have 0 <= n <= 64 and 0 <= k <= 64, and storing a lookup table is acceptable. Here’s a trick I used for using symmetry to only store half of each row.

We start by precomputing all binom(n, k) via the recurrence B(n, k) = B(n - 1, k - 1) + B(n - 1, k) and B(n, n) = 1 and concatenating the rows into a flat array. Then, computing B(n, k) involves looking up the start of the nth row and taking the kth element of that row. The ith row has length i + 1, so the nth row starts at 1 + 2 + ... + n, which is n * (n + 1) / 2.

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1

But, we’d like to store only the first half of each row, computing B(n, k) as B(n, min(k, n - k)).

1
1
1 2
1 3
1 4 6

Now, the length of the nth row is ceil((i + 1) / 2), and since ceil((n + 1) / m) = floor(n / m) + 1, we can compute this as n // 2 + 1 using integer division. To compute the start of the nth row in our table, we want to compute \sum_{i=0}^{n-1} {i // 2 + 1} in closed form.

The resulting table is compact, and querying it is efficient.

fn binomial_coefficient(n: u8, k: u8) -> u64 {
let (q, r) = (n as usize / 2, n as usize % 2);
let row_start = (q + r) * (q + 1);
let k = cmp::min(k, n - k) as usize;
COEFFICIENT_TABLE[row_start + k]
}

George Spelvinsays:I know you know this, but to clarify for others: a division may require a multiply and two shifts (case 3 in the Granlund-Montgomery-Warren algorithm) or more (case 4). Computing binomial coefficients can use a single shift because it falls into the special case of a division which is

known a priori to be exact.Also, you can eke a tiny bit more range out of

`fastbinomial(n,k)`

if you do the multiplication by`f.inverse`

before the shift.The largest value which must be represented internally within

`fastbinomial`

is`fastbinomial(n,k) * k`

; if that overflows 64 bits, the result will be inaccurate. However, the overflow does not matter for the multiplication by the inverse, only the shift, so only`fastbinomial(n,k) << f.shift`

needs to fit into 64 bits.Daniel Lemiresays:I do not understand this comment. Can you elaborate?

Roman Cheplyakasays:And if anyone is wondering why the division is exact (like I was), I wrote up an explanation here: https://ro-che.info/articles/2020-03-22-binomial-coefficients-integer-division

Sujay Jayakarsays:I was just working on a similar problem, and for my application I have

`0 <= n <= 64`

and`0 <= k <= 64`

, and storing a lookup table is acceptable. Here’s a trick I used for using symmetry to only store half of each row.We start by precomputing all

`binom(n, k)`

via the recurrence`B(n, k) = B(n - 1, k - 1) + B(n - 1, k)`

and`B(n, n) = 1`

and concatenating the rows into a flat array. Then, computing`B(n, k)`

involves looking up the start of the`n`

th row and taking the`k`

th element of that row. The`i`

th row has length`i + 1`

, so the`n`

th row starts at`1 + 2 + ... + n`

, which is`n * (n + 1) / 2`

.`1`

1 1

1 2 1

1 3 3 1

1 4 6 4 1

But, we’d like to store only the first half of each row, computing

`B(n, k)`

as`B(n, min(k, n - k))`

.`1`

1

1 2

1 3

1 4 6

Now, the length of the

`n`

th row is`ceil((i + 1) / 2)`

, and since`ceil((n + 1) / m) = floor(n / m) + 1`

, we can compute this as`n // 2 + 1`

using integer division. To compute the start of the`n`

th row in our table, we want to compute`\sum_{i=0}^{n-1} {i // 2 + 1}`

in closed form.`n: 0 1 2 3 4 5 ...`

row_len: 1 1 2 2 3 3 ...

row_start: 0 1 2 4 6 9 ...

If we have even

`n = 2m`

, then the sum of the previous`row_len`

s is just`2 * (1 + 2 + ... m)`

:`row_start(2m) = 2 * (1 + 2 + ... + m)`

= 2 * m * (m + 1) / 2

= m * (m + 1)

Then, if we have odd

`n = 2m + 1`

, we need to add in`row_len(2m)`

.`row_start(2m + 1) = m * (m + 1) + row_len(2m)`

= m * (m + 1) + (m + 1)

= (m + 1) * (m + 1)

So we can now combine the two cases:

`row_start(n) = (n // 2 + n % 2) * (n // 2 + 1)`

The resulting table is compact, and querying it is efficient.

`fn binomial_coefficient(n: u8, k: u8) -> u64 {`

let (q, r) = (n as usize / 2, n as usize % 2);

let row_start = (q + r) * (q + 1);

let k = cmp::min(k, n - k) as usize;

COEFFICIENT_TABLE[row_start + k]

}

My code is in this pull request.