Daniel Lemire's blog

, 5 min read

Quickly pruning elements in SIMD vectors using the simdprune library

9 thoughts on “Quickly pruning elements in SIMD vectors using the simdprune library”

  1. Gianluca Della Vedova says:

    In the example of the README of your project, shouldn’t the zero vector 0,0,0,0,0,0,0,0 be a one vector 1,…,1?

    1. Yes. Fixed. Thank you.

  2. KWillets says:

    There was a thread on stack exchange about packing left from a mask, and they recommended using PEXT to pull the bits. Would that work here?

    1. Yes. It can be made to work. It might be very useful for pruning bytes because the current solution, with a large table, is not ideal.

      How to make it all come together for high efficiency is the tricky part.

      1. KWillets says:

        Here’s the thread: http://stackoverflow.com/questions/36932240/avx2-what-is-the-most-efficient-way-to-pack-left-based-on-a-mask

        They also mention VCOMPRESSPS for 32-bit values under AVX512.

        1. I am aware of vcompress and it is mentioned in the README of the library. It is not super useful because none of us has access to it.

          The BMI code is cool.

          1. I have added, for benchmarking purpose, the BMI approach and, in my tests, it is slower. The BMI instructions can be nice, but they often have high latency so if you string them with data dependencies, it is not always super efficient.

            1. KWillets says:

              Ryzen instructions just came out on Agner’s site, and PEXT/PDEP have reciprocal latency of 18 cycles. 🙁

              1. @KWillets

                That sounds bad. On the other hand, I am not sure that PEXT/PDEP is common in software, or even that it will become common.