15th June 2016, 4 min read

How fast is tabulation-based hashing? The downsides of Zobrist…

5 thoughts on “How fast is tabulation-based hashing? The downsides of Zobrist…”

Jim Apple says:

June 25, 2016 at 1:29 am

The usual strategy is to use a fast hash function to hash down to 32 or 64 bits and then use tabulation hashing on that 32-to-64-bit value.
1. Daniel Lemire says:
  
  June 25, 2016 at 1:49 am
  
  There are real-world systems that use Zobrist hashing as I described it. See for example Gigablast (https://www.gigablast.com/). Of course, you can use tabulation-based as a final step, and then its speed is less of a concern. You should still expect it to be slower than arithmetic-based functions.
  1. Jim Apple says:
    
    June 25, 2016 at 2:03 am
    
    I agree with your expectation of slowness compared to arithmetic functions. However, I suspect it is the fastest known hash function for 32-bit or 64-bit input that has the nice theoretical guarantees it has for hash tables, including for linear probing and cuckoo hashing, as noted in the work of Mihai PÄƒtraÈ™cu and Mikkel Thorup.
Jim Apple says:

June 25, 2016 at 1:56 pm

Where is the crossover point between CLHash and tabulation hashing? That is, if my input is guaranteed to be 64 bytes long, in your recent experiments, is Zobrist or CLHash faster? What about 16 bytes or 1024 bytes?
1. Daniel Lemire says:
  
  June 25, 2016 at 2:26 pm
  
  Your 16-byte Zobrist hashing would use 16kB of memory. Cache faults, if they occur, could dominate running time for sure. Maybe you can hold the 16kB in L1 cache and avoid expensive cache misses. Then you might do well. But this 16kB of L1 cache is a precious resource that the rest of your code might need…
  
  If you have little need for the L1 cache, maybe because you are working with small data structures, or you have mostly sequential access… then Zobrist would do well.
  
  Otherwise, it will start to generate cache faults.
  
  As usual, there is nothing better than actual testing.