I know you know this, but to clarify for others: a division may require a multiply and two shifts (case 3 in the Granlund-Montgomery-Warren algorithm) or more (case 4). Computing binomial coefficients can use a single shift because it falls into the special case of a division which is known a priori to be exact.
Also, you can eke a tiny bit more range out of fastbinomial(n,k) if you do the multiplication by f.inverse before the shift.
The largest value which must be represented internally within fastbinomial is fastbinomial(n,k) * k; if that overflows 64 bits, the result will be inaccurate. However, the overflow does not matter for the multiplication by the inverse, only the shift, so only fastbinomial(n,k) << f.shift needs to fit into 64 bits.
I was just working on a similar problem, and for my application I have 0 <= n <= 64 and 0 <= k <= 64, and storing a lookup table is acceptable. Here’s a trick I used for using symmetry to only store half of each row.
We start by precomputing all binom(n, k) via the recurrence B(n, k) = B(n - 1, k - 1) + B(n - 1, k) and B(n, n) = 1 and concatenating the rows into a flat array. Then, computing B(n, k) involves looking up the start of the nth row and taking the kth element of that row. The ith row has length i + 1, so the nth row starts at 1 + 2 + ... + n, which is n * (n + 1) / 2.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
But, we’d like to store only the first half of each row, computing B(n, k) as B(n, min(k, n - k)).
1
1
1 2
1 3
1 4 6
Now, the length of the nth row is ceil((i + 1) / 2), and since ceil((n + 1) / m) = floor(n / m) + 1, we can compute this as n // 2 + 1 using integer division. To compute the start of the nth row in our table, we want to compute \sum_{i=0}^{n-1} {i // 2 + 1} in closed form.
The resulting table is compact, and querying it is efficient.
fn binomial_coefficient(n: u8, k: u8) -> u64 {
let (q, r) = (n as usize / 2, n as usize % 2);
let row_start = (q + r) * (q + 1);
let k = cmp::min(k, n - k) as usize;
COEFFICIENT_TABLE[row_start + k]
}
I know you know this, but to clarify for others: a division may require a multiply and two shifts (case 3 in the Granlund-Montgomery-Warren algorithm) or more (case 4). Computing binomial coefficients can use a single shift because it falls into the special case of a division which is known a priori to be exact.
Also, you can eke a tiny bit more range out of
fastbinomial(n,k)
if you do the multiplication byf.inverse
before the shift.The largest value which must be represented internally within
fastbinomial
isfastbinomial(n,k) * k
; if that overflows 64 bits, the result will be inaccurate. However, the overflow does not matter for the multiplication by the inverse, only the shift, so onlyfastbinomial(n,k) << f.shift
needs to fit into 64 bits.I do not understand this comment. Can you elaborate?
And if anyone is wondering why the division is exact (like I was), I wrote up an explanation here: https://ro-che.info/articles/2020-03-22-binomial-coefficients-integer-division
I was just working on a similar problem, and for my application I have
0 <= n <= 64
and0 <= k <= 64
, and storing a lookup table is acceptable. Here’s a trick I used for using symmetry to only store half of each row.We start by precomputing all
binom(n, k)
via the recurrenceB(n, k) = B(n - 1, k - 1) + B(n - 1, k)
andB(n, n) = 1
and concatenating the rows into a flat array. Then, computingB(n, k)
involves looking up the start of then
th row and taking thek
th element of that row. Thei
th row has lengthi + 1
, so then
th row starts at1 + 2 + ... + n
, which isn * (n + 1) / 2
.1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
But, we’d like to store only the first half of each row, computing
B(n, k)
asB(n, min(k, n - k))
.1
1
1 2
1 3
1 4 6
Now, the length of the
n
th row isceil((i + 1) / 2)
, and sinceceil((n + 1) / m) = floor(n / m) + 1
, we can compute this asn // 2 + 1
using integer division. To compute the start of then
th row in our table, we want to compute\sum_{i=0}^{n-1} {i // 2 + 1}
in closed form.n: 0 1 2 3 4 5 ...
row_len: 1 1 2 2 3 3 ...
row_start: 0 1 2 4 6 9 ...
If we have even
n = 2m
, then the sum of the previousrow_len
s is just2 * (1 + 2 + ... m)
:row_start(2m) = 2 * (1 + 2 + ... + m)
= 2 * m * (m + 1) / 2
= m * (m + 1)
Then, if we have odd
n = 2m + 1
, we need to add inrow_len(2m)
.row_start(2m + 1) = m * (m + 1) + row_len(2m)
= m * (m + 1) + (m + 1)
= (m + 1) * (m + 1)
So we can now combine the two cases:
row_start(n) = (n // 2 + n % 2) * (n // 2 + 1)
The resulting table is compact, and querying it is efficient.
fn binomial_coefficient(n: u8, k: u8) -> u64 {
let (q, r) = (n as usize / 2, n as usize % 2);
let row_start = (q + r) * (q + 1);
let k = cmp::min(k, n - k) as usize;
COEFFICIENT_TABLE[row_start + k]
}
My code is in this pull request.