26th October 2015, 10 min read

Crazily fast hashing with carry-less multiplications

12 thoughts on “Crazily fast hashing with carry-less multiplications”

J. Andrew Rogers says:

October 26, 2015 at 11:02 pm

What does the avalanche bias profile look like? One of the problems with most hash function families is that they trade quality for speed. As computing systems become larger and faster, bias in non-cryptographic hash functions starts to manifests as real pathologies in the software that uses them. The development of algorithms like MurmurHash, CityHash, etc were driven by this.

One of my motivations for the research that led to the MetroHash functions is that I needed non-cryptographic hash functions that were significantly faster than CityHash but which had randomness comparable to cryptographic functions (which you do not get from CityHash) for some use cases inside database engines. And it looked like an interesting research problem, of course.
1. Daniel Lemire says:
  
  October 27, 2015 at 12:12 am
  
  You might be interested in Section 6.1 of our manuscript. We discuss SMHasher and the avalanche effect specifically.
  1. J. Andrew Rogers says:
    
    October 27, 2015 at 6:14 am
    
    Thanks, I feel a little silly now for having not read that first. 🙂
    
    Interesting stuff.
Jonathan Graehl says:

October 26, 2015 at 11:56 pm

1. only competitive-speed on recent CPUs, right?

2. small key (64 bytes or less) performance is worse than xxHash ( https://github.com/Cyan4973/xxHash ) and Farmhash ( https://github.com/google/farmhash ) which are both faster than City, but perhaps there’s a weaker quality use of this instruction that can give something suitable e.g. for typical string-keyed hash tables? All I know about quality is that these all past smhasher, but perhaps they cheated a bit by tuning to meet those particular tests.
1. Daniel Lemire says:
  
  October 27, 2015 at 12:21 am
  
  Note that CityHash, xxHash, Farmhash… are not universal. CLHash is XOR universal. (Almost so on long strings.)
  
  1. only competitive-speed on recent CPUs, right?
  
  Yes. On older CPUs, the carry-less multiplication had a low throughput (except maybe on AMD CPUs where it did better). This is reviewed in our earlier paper (Strongly universal string hashing is fast).
  
  2. small key (64 bytes or less) performance is worse (…)
  
  For short strings, performance is likely driven by latency, and the carry-less multiplications have relatively high latency (7 cycles). So I expect it will be difficult to get really exciting results on short strings with carry-less multiplications, at least if you care about the latency of the hash function.
Xandor says:

October 27, 2015 at 5:52 pm

I’m sorry, I’d like asking why it is possible to have a*4 == b*4 with a != b in a computer? Maybe I’m missing some basic math but I can’t see that.

Thank you.
1. Daniel Lemire says:
  
  October 27, 2015 at 6:32 pm
  You can verify that, in Java, you have the following:
```
int a = 0;
int b = 1073741824;
a != b; // true
4 * a == 4 * b ; // true
```
  The result is not Java specific of course, but I have to write the code in some language.
  1. PaweÅ‚ Boraca says:
    
    October 29, 2015 at 11:51 pm
    
    In the post it says:
    a * 4 == b * 4
    whereas in your comment it’s:
    4 * a == a * b
    Which is a totally different thing and I’m still confused.
    1. Daniel Lemire says:
      
      October 30, 2015 at 12:27 am
      Let me try with actual code.
      
      You can run the following Java program:
      
      public class CrazyMath { public static void main (String[] args) { int a = 0; int b = 1073741824; System.out.println(4 * a == 4 * b); } }
      
      Or you can write the following C program:
      
      #include #include int main() { uint32_t a = 0; uint32_t b = 1073741824; printf("%s\n",a*4==4*b?"equal":"not equal"); }
      
      Or you can try the following Go program…
      
      package main
      import "fmt"
      
      func main() { a:=0; b:=1073741824; fmt.Println(a4==b4); }
      
      And so forth, and so forth…
      
      Please try it out for yourself.
      1. Chris Rosendorf says:
        
        November 17, 2015 at 3:20 pm
        
        But isn’t that because 1073741824 * 4 = 2^32 * 2? Where a 32bit integer cannot go beyond? Thus wrapping itself around to 0? (I havn’t studied computer engineering, only been programming for a couple years, so only speaking from hands on experience, no theoretical knowledge, basically).
        
        Daniel Lemire says:
        
        November 17, 2015 at 3:53 pm
        
        Right. Most programming languages rely on integers spanning a finite and pre-determined number of bits. That’s because this is how the underlying hardware works.
        
        A language like Python will support arbitrary integers… this comes at a performance cost however.
        
        JavaScript is just strange as it has no support for integers per se. The following is true in JavaScript…
        
        100000000000000000==100000000000000001
Gabe Morales says:

March 6, 2023 at 9:33 am

One use of carry-less multiplication I’ve run across is morton encoding. Morton encoding is a process to produce an octree from a list of objects to retain locality of reference. Assuming a space of objects, if we take their X and Y (and Z and any other dimension) position as binary, then interleave the bitstring into one binary string, then order these resultant binary strings from 0000… to 1111…, we arrive at a Z-ordered curve where objects in our list that are physically close together are also close together in the list. This can reduce collision checks by orders of magnitude. The problem is the process for encoding a morton string is usually too slow for real time applications without dedicated opcodes (eg PDEP and PEXT) or memory-expensive LUTs.

Well, one neat side effect of carry-less multiplication is that any number multiplied by itself will yield the original bitstring interleaved with 0s, which is perfect for bitwise operation to combine into one morton code!