Daniel Lemire's blog

, 1 min read

Do you realize that you are using random hashing?

Hashing is a common concept in programming. It is the process of mapping any object to an integer. It is used in fast search algorithms (like Rabin-Karp) and in hash maps. Almost all non-trivial software depends on hashing. Hashing often works best when it appears random.

It used to be that hashing was deterministic. The creators of a language would specify once and for all how a given string would be hashed. For example, in Java, the hash value of the string “a” is 97 ("a".hashCode()). No matter how often you run a program, the hash value will always be 97.

This can be a security problem because an adversary can find many strings that hash to the value 97. And then hashing no longer appears random.

For this reason, language designers are migrating to random hashing. In random hashing, the hash value of the string “a” can change every time you run your program. Random hashing is now the default in Python (as of version 3.3), Ruby (as of version 1.9) and Perl (as of version 5.18).

In Ruby, random hashing means that if you run the following program many times, you will get different outputs…

puts "a".hash

The same is true of the following Python program…

print ({"a":1,"b":2})

Here is the output I got, running it twice:

$ python3 tmp.py
{'a': 1, 'b': 2}
$ python3 tmp.py
{'b': 2, 'a': 1}

To my knowledge, Java still relies on deterministic hashing by default. However, the specification requires you to expect random hashing: the return value of hashCode() can change from one program execution to another. How many Ruby, Python and Perl programmers realize that they are relying on random hashing?