26th May 2023, 2 min read

Expected performance of a Bloom filter

Laurentiu Nicola says:

May 28, 2023 at 7:59 am

By spending about 12 bits per value (12 bits time the size of the set)
and using 8 hash functions, the probability of a false positive is
about 1/256 (or 0.3%) which is reasonable.

If you know a bit about software performance, the 8 bits could be
concerning. Looking up 8 values at random location in memory is not
cheap. And, indeed, when the element is in the set, you must check all
12 locations. It is expensive.

I think you’d still need to check 8, not 12 locations, one for each hash function, regardless of the filter size. Was that a typo?

Anyway, I’ve never heard of XOR filters, thanks for the reference and nice benchmarks!

PS: your WordPress database doesn’t like non-Swedish diacritics:

Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation ‘=’
1. Daniel Lemire says:
  
  May 28, 2023 at 10:49 pm
  
  Thanks. It was indeed a typo.
  
  I know a lot about Unicode, but not a lot about how MySQL handles it.