12th December 2019, 5 min read

Are 64-bit random identifiers free from collision?

8 thoughts on “Are 64-bit random identifiers free from collision?”

Sylvain Hallé says:

December 13, 2019 at 12:05 am

Interesting! I’d be curious to know why you wrote a post about that. Did you encounter such a problem in your own work?
1. Daniel Lemire says:
  
  December 13, 2019 at 2:44 pm
  
  Yes. Some people report using 64-bit hash values as identifiers and they sometimes believe that it makes collision highly improbable. That’s true when dealing with tiny sets, but as I demonstrate, for many practical cases, collisions are actually quite likely.
  
  So, yes, this post is motivated by my discussions with people building systems.
Peter says:

December 13, 2019 at 9:58 pm

And with 64-bits, they are easy to find! I did this exercise recently with FNV-1a: https://twitter.com/pstbrn/status/1201044024892829697
1. Peter says:
  
  December 13, 2019 at 10:00 pm
  
  (This is hash collisions, not random identifiers. But finding them used the same principle – birthday attack.)
Thomas Müller Graf says:

December 14, 2019 at 5:29 pm

Randomly generated UUIDs, which are 122 bits, are less problematic: the probability of getting a duplicate is much lower than the probability of being hit by a meteorite (I calculated that some time ago and put it on Wikipedia, but somebody edited it away).

128-bit values are even better. However, there is a risk if you generate them with a “lower quality” hash function (not sure what low quality is… Murmur hash seems OK, SipHash is better I think). If an attacker can supply the data, then MD4, MD5, and even SHA-1 are risky: the attacker could use specially crafted data that will result in a hash collision. So, for this case, better use SHA-256.
John the Scott says:

December 21, 2019 at 4:45 pm

what about a 160 bit hash built by ripemd(sha256), i.e, bitcoin?
1. Daniel Lemire says:
  
  December 23, 2019 at 4:11 pm
  
  160-bit is perfectly fine.
  
  There is a huge difference between 64-bit and 160-bit.
Ricardo says:

April 18, 2023 at 12:52 am

And that is why I always choose sequential id assignment instead of random 🙂

This is, however, an interesting idea. What would be the probability to collide twice in a row if whenever we find a collision we retry and generate a new random 64-bot id?