, 2 min read
Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)
4 thoughts on “Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)”
, 2 min read
4 thoughts on “Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)”
“What if we use 256-byte registers instead?” IMHO there is a typo
“What if we use 256-byte registers instead?”
Then we’re living in the future where 64k-bit cpu’s are normal! =)
Does the code assume that UTF-8 strings are always byte aligned?
The trouble with UTF-8 is the variable-length, so you will eventually have one that crosses a 32-byte boundary.
There is no assumption made with respect to alignment.