Daniel Lemire's blog

, 1 min read

Transcoding Latin 1 strings to UTF-8 strings at 18 GB/s using AVX-512

One thought on “Transcoding Latin 1 strings to UTF-8 strings at 18 GB/s using AVX-512”

  1. -.- says:

    My attempt: https://pastebin.com/pkswn1yt

    The general problem with expanding is that you’re only processing half the vector at a time.
    If the likelihood of non-ASCII characters is rare, converting from UTF-8 makes better use of the vector width than converting to UTF-8.
    In such a case, you can try to claw back some performance by adding shortcuts if few non-ASCII characters are detected. Though weirdly, it doesn’t seem to work too well in my case; haven’t really investigated why.