Daniel Lemire's blog

, 8 min read

Stream VByte: breaking new speed records for integer compression

11 thoughts on “Stream VByte: breaking new speed records for integer compression”

  1. Bye Fruit says:

    This is interesting and the paper is very readable.

    That said, are there any cases where you’d use Stream VByte if you already had access to something like SIMD-BP128 or SIMD-PFOR?

    1. Thanks!

      In the end, it is about having options as there is no silver bullet as far as software and algorithms are concerned.

      For raw speed, it is hard to beat SIMD-BP128 if you have sizeable arrays and all you care about is decoding blocks to cache. And the compression rate is going to be quite good.

      Stream VByte is simpler in the sense that you require just a few instructions to decode a block of 4 integers. So you can trivially iterate over blocks of 4 integers and process them while they are still in registers, without writing them to cache.

      For SIMD-BP128, if you want to do something of the sort, it is simply going to be harder to program if you want to get the best possible performance (i.e., if you want to avoid writing the uncompressed data to cache and reloading it).

      To sum it up, sometimes it is just desirable to iterate over values in tiny blocks that stay in vector registers.

      We allude to this toward the end of the paper where we benchmark “seek” functions.

      So, for some applications, Stream VByte will definitively be more appropriate.

      1. Bye Fruit says:

        Great explanation, thanks.

  2. Martin Cohen says:

    Minor typo: “we need to keep the fed”.

  3. Mike Scirocco says:

    Very clear article, quite readable, and very generous of you not to patent this impressive bit of work.

  4. Kaartic Sivaraam says:

    I think there is a typo in the following snippet:

    you can compute y[0], y[0] + y[1], y[1] + y[2],… to recover x[0], x[1], x[2],…

    y[1] + y[2] does not give x[2]. x[1] + y[2] does give x[2], though.

    1. You are correct. This is what was meant: y[0], y[0] + y[1], y[0] + y[1] + y[2],.... It is a prefix sum.

  5. Dan says:

    Hi, great blog and papers! My interest is in analog-to-digital measurement streams of physical processes, say from an oscilloscope. These typically produce measurement integers 16 bit or less with zigzag encoded deltas typically no more than 8 bits. Subsequent operations on the decoded integers might be multiplying and adding calibration constants to recover the measured voltage data, finding elements greater than a voltage threshold, and performing DSP such as numerically integrating, window averaging, etc. The encoded stream would be loaded from disk or network as it does not make sense as an in-memory index like a database might use. Is Stream VByte, either off-the-shelf or adapted for 16 bit integers (e.g. 1 control bit instead of 2), a good fit for this? Is there any other approach you think might work better or are any performance gains unlikely to be realized by a delta encoding scheme? TYVM, I have learned a lot.

    1. Without domain knowledge, it can be difficult to make technical recommendations. I do not know enough about your area to comment.

      1. Dan says:

        That is reasonable. Thanks all the same. Are there some good searchable terms or resources you can recommend so I can learn how to evaluate the efficacy on my own?

        1. You can pick up an open source library and test different codecs on your data. You could start, for example, with https://github.com/lemire/SIMDCompressionAndIntersection if you are a C++ programmer. There are comparable libraries in most programming languages.