By spending about 12 bits per value (12 bits time the size of the set)
and using 8 hash functions, the probability of a false positive is
about 1/256 (or 0.3%) which is reasonable.
If you know a bit about software performance, the 8 bits could be
concerning. Looking up 8 values at random location in memory is not
cheap. And, indeed, when the element is in the set, you must check all
12 locations. It is expensive.
I think you’d still need to check 8, not 12 locations, one for each hash function, regardless of the filter size. Was that a typo?
Anyway, I’ve never heard of XOR filters, thanks for the reference and nice benchmarks!
PS: your WordPress database doesn’t like non-Swedish diacritics:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation ‘=’
I think you’d still need to check 8, not 12 locations, one for each hash function, regardless of the filter size. Was that a typo?
Anyway, I’ve never heard of XOR filters, thanks for the reference and nice benchmarks!
PS: your WordPress database doesn’t like non-Swedish diacritics:
Thanks. It was indeed a typo.
I know a lot about Unicode, but not a lot about how MySQL handles it.