As an aside, how much gain is there in eliminating the edge cases?
When writing string-use-intense applications, I tended to allocate string buffers of a power-of-two size (256 bytes or less), from a string pool, and reallocate through a free-list. Did this for efficiency in allocation (measured). As a side-effect those page-aligned size-quantized buffers would fit your algorithm without edge cases. Has to be some gain there.
How far does this go – if you design a string-class to take full advantage through eliminating edge-cases?
When we allocate a string buffer, we could always zero-fill the buffer (there will always be a zero), or non-zero-fill the buffer (only zero belongs to the written data).
Most application-strings are short – less than 256 bytes, and mostly less than 80 bytes. How much gain in unrolling the loop?
Clearly for very long strings, the edge-case hardly matters. But most strings are short – near to the size of a vector-stride.
Hello,
As per specification, SVE vector length can’t exceed 2048 bits/256 bytes so svcntb will never be larger than 256.
Beyond the slide deck you linked, Arm has published several routines here: https://github.com/ARM-software/optimized-routines/tree/master/string/aarch64
Thanks: it is great to find out that my missing check was unnecessary.
As an aside, how much gain is there in eliminating the edge cases?
When writing string-use-intense applications, I tended to allocate string buffers of a power-of-two size (256 bytes or less), from a string pool, and reallocate through a free-list. Did this for efficiency in allocation (measured). As a side-effect those page-aligned size-quantized buffers would fit your algorithm without edge cases. Has to be some gain there.
How far does this go – if you design a string-class to take full advantage through eliminating edge-cases?
When we allocate a string buffer, we could always zero-fill the buffer (there will always be a zero), or non-zero-fill the buffer (only zero belongs to the written data).
Most application-strings are short – less than 256 bytes, and mostly less than 80 bytes. How much gain in unrolling the loop?
Clearly for very long strings, the edge-case hardly matters. But most strings are short – near to the size of a vector-stride.