27th September 2017, 8 min read

Stream VByte: breaking new speed records for integer compression

11 thoughts on “Stream VByte: breaking new speed records for integer compression”

Bye Fruit says:

September 27, 2017 at 7:59 pm

This is interesting and the paper is very readable.

That said, are there any cases where you’d use Stream VByte if you already had access to something like SIMD-BP128 or SIMD-PFOR?
1. Daniel Lemire says:
  
  September 27, 2017 at 8:20 pm
  
  Thanks!
  
  In the end, it is about having options as there is no silver bullet as far as software and algorithms are concerned.
  
  For raw speed, it is hard to beat SIMD-BP128 if you have sizeable arrays and all you care about is decoding blocks to cache. And the compression rate is going to be quite good.
  
  Stream VByte is simpler in the sense that you require just a few instructions to decode a block of 4 integers. So you can trivially iterate over blocks of 4 integers and process them while they are still in registers, without writing them to cache.
  
  For SIMD-BP128, if you want to do something of the sort, it is simply going to be harder to program if you want to get the best possible performance (i.e., if you want to avoid writing the uncompressed data to cache and reloading it).
  
  To sum it up, sometimes it is just desirable to iterate over values in tiny blocks that stay in vector registers.
  
  We allude to this toward the end of the paper where we benchmark “seek” functions.
  
  So, for some applications, Stream VByte will definitively be more appropriate.
  1. Bye Fruit says:
    
    September 27, 2017 at 8:50 pm
    
    Great explanation, thanks.
Martin Cohen says:

September 29, 2017 at 5:30 am

Minor typo: “we need to keep the fed”.
Mike Scirocco says:

January 20, 2018 at 5:01 am

Very clear article, quite readable, and very generous of you not to patent this impressive bit of work.
Kaartic Sivaraam says:

October 25, 2019 at 2:30 pm

I think there is a typo in the following snippet:

you can compute y[0], y[0] + y[1], y[1] + y[2],… to recover x[0], x[1], x[2],…

y[1] + y[2] does not give x[2]. x[1] + y[2] does give x[2], though.
1. Daniel Lemire says:
  
  October 26, 2019 at 2:06 am
  
  You are correct. This is what was meant: y[0], y[0] + y[1], y[0] + y[1] + y[2],.... It is a prefix sum.
Dan says:

December 13, 2023 at 11:31 pm

Hi, great blog and papers! My interest is in analog-to-digital measurement streams of physical processes, say from an oscilloscope. These typically produce measurement integers 16 bit or less with zigzag encoded deltas typically no more than 8 bits. Subsequent operations on the decoded integers might be multiplying and adding calibration constants to recover the measured voltage data, finding elements greater than a voltage threshold, and performing DSP such as numerically integrating, window averaging, etc. The encoded stream would be loaded from disk or network as it does not make sense as an in-memory index like a database might use. Is Stream VByte, either off-the-shelf or adapted for 16 bit integers (e.g. 1 control bit instead of 2), a good fit for this? Is there any other approach you think might work better or are any performance gains unlikely to be realized by a delta encoding scheme? TYVM, I have learned a lot.
1. Daniel Lemire says:
  
  December 14, 2023 at 12:30 am
  
  Without domain knowledge, it can be difficult to make technical recommendations. I do not know enough about your area to comment.
  1. Dan says:
    
    December 14, 2023 at 4:45 pm
    
    That is reasonable. Thanks all the same. Are there some good searchable terms or resources you can recommend so I can learn how to evaluate the efficacy on my own?
    1. Daniel Lemire says:
      
      December 14, 2023 at 6:36 pm
      
      You can pick up an open source library and test different codecs on your data. You could start, for example, with https://github.com/lemire/SIMDCompressionAndIntersection if you are a C++ programmer. There are comparable libraries in most programming languages.