Daniel Lemire's blog

, 2 min read

Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)

4 thoughts on “Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)”

  1. Ludovic Kuty says:

    “What if we use 256-byte registers instead?” IMHO there is a typo

  2. Badger says:

    “What if we use 256-byte registers instead?”

    Then we’re living in the future where 64k-bit cpu’s are normal! =)

  3. Michael Bisbjerg says:

    Does the code assume that UTF-8 strings are always byte aligned?

    The trouble with UTF-8 is the variable-length, so you will eventually have one that crosses a 32-byte boundary.

    1. There is no assumption made with respect to alignment.