, 1 min read
Transcoding Latin 1 strings to UTF-8 strings at 18 GB/s using AVX-512
One thought on “Transcoding Latin 1 strings to UTF-8 strings at 18 GB/s using AVX-512”
, 1 min read
One thought on “Transcoding Latin 1 strings to UTF-8 strings at 18 GB/s using AVX-512”
My attempt: https://pastebin.com/pkswn1yt
The general problem with expanding is that you’re only processing half the vector at a time.
If the likelihood of non-ASCII characters is rare, converting from UTF-8 makes better use of the vector width than converting to UTF-8.
In such a case, you can try to claw back some performance by adding shortcuts if few non-ASCII characters are detected. Though weirdly, it doesn’t seem to work too well in my case; haven’t really investigated why.