Daniel Lemire's blog

, 2 min read

Expected performance of a Bloom filter

2 thoughts on “Expected performance of a Bloom filter”

  1. By spending about 12 bits per value (12 bits time the size of the set)
    and using 8 hash functions, the probability of a false positive is
    about 1/256 (or 0.3%) which is reasonable.

    If you know a bit about software performance, the 8 bits could be
    concerning. Looking up 8 values at random location in memory is not
    cheap. And, indeed, when the element is in the set, you must check all
    12 locations. It is expensive.

    I think you’d still need to check 8, not 12 locations, one for each hash function, regardless of the filter size. Was that a typo?

    Anyway, I’ve never heard of XOR filters, thanks for the reference and nice benchmarks!

    PS: your WordPress database doesn’t like non-Swedish diacritics:

    Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation ‘=’

    1. Thanks. It was indeed a typo.

      I know a lot about Unicode, but not a lot about how MySQL handles it.