Daniel Lemire's blog

, 6 min read

When accessing hash tables, how much time is spent computing the hash functions?

8 thoughts on “When accessing hash tables, how much time is spent computing the hash functions?”

  1. Evan says:

    Java uses a really simple hash function… Moreover, the hash value of each Integer object is the integer value itself.

    Are you sure about that? It seems pretty weak, and it doesn’t match what I’ve read elsewhere, though that information may be obsolete. My understanding is that hashCode will return the integer itself, but HashMap does some additional work to reduce unintentional collisions…

    That said, the Java hash function I’m thinking of is quite fast; significantly faster than anything else on http://www.burtleburtle.net/bob/hash/integer.html (I did some benchmarks on the algorithms on that page recently using uarch-bench; it’s pretty trivial, but I’d be happy to post it if you’re interested).

    1. You are correct, HashMap does additional work on top of the value returned by hashCode, but this only makes my point stronger.

  2. eugene says:

    Feels good to read this… That is the main reason for hot structures that we have in our code we keep a list of hash caches, usually small. But for the most common data it makes a big impact. Now I can safely put this link as a commemt to thatso that people would stop looking weird at me 🙂

  3. Me says:

    Rerun the benchmark with an object with just three ints so you measure more of the hash function, and less the memory layout.

    Also, note that e.g. the string class because of this effect caches its own hash code, to avoid recomputation cost. This, also benchmark four interested, one for caching the hash code, to get a clearer view.

  4. In my benchmark, my targets are going to be in L1 with high probability: I have only 100 lists and they are created right before I run the test function. So memory access should not be a problem.

    I would have used arrays for a simpler example but you can’t use arrays as keys in Java.

    Indeed, String hashes are cached (computed at most once) in Java. It is a common recommendation to cache hash values.

    I do include an example of hash-value caching in the companion code to this blog post. The blog post does not describe it because it is a secondary point.

  5. Perhaps this is an aside, but … a fixed size array of three integers? Is this a fixed aspect of your problem? (Does the dimension of three ever vary?) Would this be more efficient as a tuple?

    public class Tuple {
    int a, b, c;
    };

    Not tested, but guessing the compiler would emit more efficient code in this case.

    1. Antonio Badia says:

      I think it’d be interesting to see what happens if data does not fit into cache (which seems more realistic anyways): using (say) a large varchar (anything from 256 bytes to 1K) instead of 3 integers. My guess is that this would change the result tremendously, since hashing is notorious for destroying locality of reference anyways. Maybe when I have some free time…