What does the avalanche bias profile look like? One of the problems with most hash function families is that they trade quality for speed. As computing systems become larger and faster, bias in non-cryptographic hash functions starts to manifests as real pathologies in the software that uses them. The development of algorithms like MurmurHash, CityHash, etc were driven by this.

One of my motivations for the research that led to the MetroHash functions is that I needed non-cryptographic hash functions that were significantly faster than CityHash but which had randomness comparable to cryptographic functions (which you do not get from CityHash) for some use cases inside database engines. And it looked like an interesting research problem, of course.

Thanks, I feel a little silly now for having not read that first. 🙂

Interesting stuff.

Jonathan Graehlsays:

1. only competitive-speed on recent CPUs, right?

2. small key (64 bytes or less) performance is worse than xxHash ( https://github.com/Cyan4973/xxHash ) and Farmhash ( https://github.com/google/farmhash ) which are both faster than City, but perhaps there’s a weaker quality use of this instruction that can give something suitable e.g. for typical string-keyed hash tables? All I know about quality is that these all past smhasher, but perhaps they cheated a bit by tuning to meet those particular tests.

Note that CityHash, xxHash, Farmhash… are not universal. CLHash is XOR universal. (Almost so on long strings.)

1. only competitive-speed on recent CPUs, right?

Yes. On older CPUs, the carry-less multiplication had a low throughput (except maybe on AMD CPUs where it did better). This is reviewed in our earlier paper (Strongly universal string hashing is fast).

2. small key (64 bytes or less) performance is worse (…)

For short strings, performance is likely driven by latency, and the carry-less multiplications have relatively high latency (7 cycles). So I expect it will be difficult to get really exciting results on short strings with carry-less multiplications, at least if you care about the latency of the hash function.

Xandorsays:

I’m sorry, I’d like asking why it is possible to have a*4 == b*4 with a != b in a computer? Maybe I’m missing some basic math but I can’t see that.

But isn’t that because 1073741824 * 4 = 2^32 * 2? Where a 32bit integer cannot go beyond? Thus wrapping itself around to 0? (I havn’t studied computer engineering, only been programming for a couple years, so only speaking from hands on experience, no theoretical knowledge, basically).

Right. Most programming languages rely on integers spanning a finite and pre-determined number of bits. That’s because this is how the underlying hardware works.

A language like Python will support arbitrary integers… this comes at a performance cost however.

JavaScript is just strange as it has no support for integers per se. The following is true in JavaScript…

100000000000000000==100000000000000001

Gabe Moralessays:

One use of carry-less multiplication I’ve run across is morton encoding. Morton encoding is a process to produce an octree from a list of objects to retain locality of reference. Assuming a space of objects, if we take their X and Y (and Z and any other dimension) position as binary, then interleave the bitstring into one binary string, then order these resultant binary strings from 0000… to 1111…, we arrive at a Z-ordered curve where objects in our list that are physically close together are also close together in the list. This can reduce collision checks by orders of magnitude. The problem is the process for encoding a morton string is usually too slow for real time applications without dedicated opcodes (eg PDEP and PEXT) or memory-expensive LUTs.

Well, one neat side effect of carry-less multiplication is that any number multiplied by itself will yield the original bitstring interleaved with 0s, which is perfect for bitwise operation to combine into one morton code!

J. Andrew Rogerssays:What does the avalanche bias profile look like? One of the problems with most hash function families is that they trade quality for speed. As computing systems become larger and faster, bias in non-cryptographic hash functions starts to manifests as real pathologies in the software that uses them. The development of algorithms like MurmurHash, CityHash, etc were driven by this.

One of my motivations for the research that led to the MetroHash functions is that I needed non-cryptographic hash functions that were significantly faster than CityHash but which had randomness comparable to cryptographic functions (which you do not get from CityHash) for some use cases inside database engines. And it looked like an interesting research problem, of course.

Daniel Lemiresays:You might be interested in Section 6.1 of our manuscript. We discuss SMHasher and the avalanche effect specifically.

J. Andrew Rogerssays:Thanks, I feel a little silly now for having not read that first. 🙂

Interesting stuff.

Jonathan Graehlsays:1. only competitive-speed on recent CPUs, right?

2. small key (64 bytes or less) performance is worse than xxHash ( https://github.com/Cyan4973/xxHash ) and Farmhash ( https://github.com/google/farmhash ) which are both faster than City, but perhaps there’s a weaker quality use of this instruction that can give something suitable e.g. for typical string-keyed hash tables? All I know about quality is that these all past smhasher, but perhaps they cheated a bit by tuning to meet those particular tests.

Daniel Lemiresays:Note that CityHash, xxHash, Farmhash… are not universal. CLHash is XOR universal. (Almost so on long strings.)

1. only competitive-speed on recent CPUs, right?Yes. On older CPUs, the carry-less multiplication had a low throughput (except maybe on AMD CPUs where it did better). This is reviewed in our earlier paper (Strongly universal string hashing is fast).

2. small key (64 bytes or less) performance is worse (…)For short strings, performance is likely driven by latency, and the carry-less multiplications have relatively high latency (7 cycles). So I expect it will be difficult to get really exciting results on short strings with carry-less multiplications, at least if you care about the latency of the hash function.

Xandorsays:I’m sorry, I’d like asking why it is possible to have a*4 == b*4 with a != b in a computer? Maybe I’m missing some basic math but I can’t see that.

Thank you.

Daniel Lemiresays:You can verify that, in Java, you have the following:

The result is not Java specific of course, but I have to write the code in some language.

PaweÅ‚ Boracasays:In the post it says:

a * 4 == b * 4

whereas in your comment it’s:

4 * a == a * b

Which is a totally different thing and I’m still confused.

Daniel Lemiresays:Let me try with actual code.

You can run the following Java program:

Or you can write the following C program:

Or you can try the following Go program…

And so forth, and so forth…

Please try it out for yourself.

Chris Rosendorfsays:But isn’t that because 1073741824 * 4 = 2^32 * 2? Where a 32bit integer cannot go beyond? Thus wrapping itself around to 0? (I havn’t studied computer engineering, only been programming for a couple years, so only speaking from hands on experience, no theoretical knowledge, basically).

Daniel Lemiresays:Right. Most programming languages rely on integers spanning a finite and pre-determined number of bits. That’s because this is how the underlying hardware works.

A language like Python will support arbitrary integers… this comes at a performance cost however.

JavaScript is just strange as it has no support for integers per se. The following is true in JavaScript…

100000000000000000==100000000000000001

Gabe Moralessays:One use of carry-less multiplication I’ve run across is morton encoding. Morton encoding is a process to produce an octree from a list of objects to retain locality of reference. Assuming a space of objects, if we take their X and Y (and Z and any other dimension) position as binary, then interleave the bitstring into one binary string, then order these resultant binary strings from 0000… to 1111…, we arrive at a Z-ordered curve where objects in our list that are physically close together are also close together in the list. This can reduce collision checks by orders of magnitude. The problem is the process for encoding a morton string is usually too slow for real time applications without dedicated opcodes (eg PDEP and PEXT) or memory-expensive LUTs.

Well, one neat side effect of carry-less multiplication is that any number multiplied by itself will yield the original bitstring interleaved with 0s, which is perfect for bitwise operation to combine into one morton code!