, 4 min read
How fast is tabulation-based hashing? The downsides of Zobrist…
5 thoughts on “How fast is tabulation-based hashing? The downsides of Zobrist…”
, 4 min read
5 thoughts on “How fast is tabulation-based hashing? The downsides of Zobrist…”
The usual strategy is to use a fast hash function to hash down to 32 or 64 bits and then use tabulation hashing on that 32-to-64-bit value.
There are real-world systems that use Zobrist hashing as I described it. See for example Gigablast (https://www.gigablast.com/). Of course, you can use tabulation-based as a final step, and then its speed is less of a concern. You should still expect it to be slower than arithmetic-based functions.
I agree with your expectation of slowness compared to arithmetic functions. However, I suspect it is the fastest known hash function for 32-bit or 64-bit input that has the nice theoretical guarantees it has for hash tables, including for linear probing and cuckoo hashing, as noted in the work of Mihai Pătrașcu and Mikkel Thorup.
Where is the crossover point between CLHash and tabulation hashing? That is, if my input is guaranteed to be 64 bytes long, in your recent experiments, is Zobrist or CLHash faster? What about 16 bytes or 1024 bytes?
Your 16-byte Zobrist hashing would use 16kB of memory. Cache faults, if they occur, could dominate running time for sure. Maybe you can hold the 16kB in L1 cache and avoid expensive cache misses. Then you might do well. But this 16kB of L1 cache is a precious resource that the rest of your code might need…
If you have little need for the L1 cache, maybe because you are working with small data structures, or you have mostly sequential access… then Zobrist would do well.
Otherwise, it will start to generate cache faults.
As usual, there is nothing better than actual testing.