30th September 2017, 3 min read

Stream VByte: first independent assessment

4 thoughts on “Stream VByte: first independent assessment”

KWillets says:

October 2, 2017 at 5:19 am

I made a start at SIMD-izing the encoding; so far my best estimate is at least 16 instructions per quadword. One hitch is that there is no SIMD unsigned gt/lt compare, so instead I’m doing shift-right and compare to zero for the three different widths. The output bitmasks are conveniently -1 or 0 however, so we can just sum them together with 3 to get the 2-bit code value for each lane.

The shuffle looks like it will need a table unless there’s some fancy way to build it — that’s still on my todo list.
1. Daniel Lemire says:
  
  October 2, 2017 at 11:36 pm
  
  Wow.
  
  Note that there is an open issue regarding endianess…
  
  https://github.com/lemire/streamvbyte/issues/4
  
  It seems that the current encoder/decoder are little-endian whereas the paper describes a big-endian format. I have not had time to look into the matter, but for interoperability, this matters.
  
  One hitch is that there is no SIMD unsigned gt/lt compare
  
  I think that if you subtract 1&lt&lt(L-1) to L-bit integers prior to doing the signed comparison, you get an unsigned comparison.
  1. Kendall Willets says:
    
    October 3, 2017 at 12:26 am
    
    I did overlook the fact that the sub would only need to happen once for the three comparisons — I’ll look at this again tonight.
    1. Daniel Lemire says:
      
      October 3, 2017 at 12:37 am
      
      I think you can almost always get unsigned comparisons very cheaply, which is probably why Intel did not bother adding support for it.