Daniel Lemire's blog

, 2 min read

Implementing ‘strlen’ using SVE

3 thoughts on “Implementing ‘strlen’ using SVE”

  1. Laurent says:

    Hello,

    As per specification, SVE vector length can’t exceed 2048 bits/256 bytes so svcntb will never be larger than 256.

    Beyond the slide deck you linked, Arm has published several routines here: https://github.com/ARM-software/optimized-routines/tree/master/string/aarch64

    1. Thanks: it is great to find out that my missing check was unnecessary.

  2. As an aside, how much gain is there in eliminating the edge cases?

    When writing string-use-intense applications, I tended to allocate string buffers of a power-of-two size (256 bytes or less), from a string pool, and reallocate through a free-list. Did this for efficiency in allocation (measured). As a side-effect those page-aligned size-quantized buffers would fit your algorithm without edge cases. Has to be some gain there.

    How far does this go – if you design a string-class to take full advantage through eliminating edge-cases?

    When we allocate a string buffer, we could always zero-fill the buffer (there will always be a zero), or non-zero-fill the buffer (only zero belongs to the written data).

    Most application-strings are short – less than 256 bytes, and mostly less than 80 bytes. How much gain in unrolling the loop?

    Clearly for very long strings, the edge-case hardly matters. But most strings are short – near to the size of a vector-stride.