6th May 2022, 2 min read

Fast bitset decoding using Intel AVX-512

Adrien says:

May 7, 2022 at 11:43 am

I think “the bitset 0b111010, you would generate the output 1,3,4,6.” should be “… 1,3,4,5”.

Very interesting as always 👍
Jatin Bhateja says:

May 7, 2022 at 6:42 pm

AVX512_VBMI2 offers VPCOMPRESSB thus one can directly compress 512 bit packed byte vector holding 0-63 values under influence of 64 bit mask. This can replace above unrolled instructions sequence.
1. Kim Walisch says:
  
  May 8, 2022 at 12:09 pm
  
  I have implemented a modified version of the AVX512_VBMI2 bitset decoding algorithm in my primesieve project that was partially inspired by Daniel’s previous blog posts on the same topic. The great thing about using VPCOMPRESSB is that this significantly improves performance for sparse bit streams (that are distributed relatively evenly), e.g. if there are only <= 16 bits set in the uint64_t bits variable an algorithm using VPCOMPRESSB would executed only about 1/4 of the instructions compared to the algorithm from this blog post. Here is a link to my AVX512_VBMI2 algorithm: https://github.com/kimwalisch/primesieve/blob/9e4e5773f122f71520a9561282e41a78948e6c89/src/PrimeGenerator.cpp#L422