Daniel Lemire's blog

, 3 min read

Stream VByte: first independent assessment

4 thoughts on “Stream VByte: first independent assessment”

  1. KWillets says:

    I made a start at SIMD-izing the encoding; so far my best estimate is at least 16 instructions per quadword. One hitch is that there is no SIMD unsigned gt/lt compare, so instead I’m doing shift-right and compare to zero for the three different widths. The output bitmasks are conveniently -1 or 0 however, so we can just sum them together with 3 to get the 2-bit code value for each lane.

    The shuffle looks like it will need a table unless there’s some fancy way to build it — that’s still on my todo list.

    1. Wow.

      Note that there is an open issue regarding endianess…

      https://github.com/lemire/streamvbyte/issues/4

      It seems that the current encoder/decoder are little-endian whereas the paper describes a big-endian format. I have not had time to look into the matter, but for interoperability, this matters.

      One hitch is that there is no SIMD unsigned gt/lt compare

      I think that if you subtract 1&lt&lt(L-1) to L-bit integers prior to doing the signed comparison, you get an unsigned comparison.

      1. Kendall Willets says:

        I did overlook the fact that the sub would only need to happen once for the three comparisons — I’ll look at this again tonight.

        1. I think you can almost always get unsigned comparisons very cheaply, which is probably why Intel did not bother adding support for it.