Daniel Lemire's blog

, 9 min read

Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake

12 thoughts on “Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake”

  1. Alex says:

    It’s worth noting that Zen 4 implements AVX-512 by splitting execution into two 256 bit stages, so instructions take twice more cycles (at least those that are 1-cycle on Intel, for complex instructions the difference is less than 2x, and in fact Zen 4 has powerful shuffling units, IIRC).

    1. Evidently, this does not seem to affect the performance negatively in a significant manner, at least in these tests. Note that we make extensive use of AVX-512.

    2. Goran Mitrovic says:

      That is not true. It uses two 256-bit units (if available), but it takes only a single amount of cycles.

  2. David says:

    Intel made their compilers available at no cost some time back. They can be found at https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html .

    1. That’s interesting. I gave up on Intel compilers some time ago because it was tiring to manage the licensing. It seems like great news that they have simplified the process.

      1. Joe Duarte says:

        Intel switched to LLVM two years ago for their main C/C++ compiler, but they still release and update their Compiler Classic, based on their own compiler internals: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Release_history

        They seem to support a lot more optimization and acceleration than competitors, though the branding for the features is hard to keep track of. They might be the only compiler to support the new matrix instructions (AMX), and have lots of support for OpenMP up through 5.x, and their libraries like Threaded Building Blocks and SYCL/OpenCL support.

        But it’s hard to keep track of their branding, products, and features. A lot of it is under “OneAPI” now. I think OneAPI is meant to include their compiler, but I’m not sure.

        Have you looked at the matrix instructions?

        1. I have not yet had access to a processor with AMX instructions.

  3. SubOptimal says:

    Only a tiny adjustment to the instructions.

    # will download the file as d2cIxRx
    wget https://cutt.ly/d2cIxRx

    # will download the file as Arabic-Lipsum.utf8.txt
    wget –content-disposition https://cutt.ly/d2cIxRx

    1. Thanks!

  4. Alex says:

    Is there any reason for maintaining “a database formatted with UTF-16”? I had thought that the only use for UTF-16 in the modern age is for legacy operating system interfaces.

    1. Last I checked, SQL Server defaulted on UTF-16. It is possible to use UTF-8 with recent versions, but it wasn’t the default when I last looked into it.

  5. Anton Ertl says:

    Results from a Xeon W-1370P (Rocket Lake); I don’t know which ones you used, so I provide all those that are UTF-8->UTF16 with icelake or iconv:

    convert_utf8_to_utf16+icelake, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt
    1.403 ins/byte, 0.440 cycle/byte, 11.871 GB/s (0.3 %), 5.224 GHz, 3.189 ins/cycle
    2.505 ins/char, 0.785 cycle/char, 6.651 Gc/s (0.3 %) 1.78 byte/char
    convert_utf8_to_utf16+iconv, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt
    32.378 ins/byte, 5.294 cycle/byte, 0.983 GB/s (0.2 %), 5.202 GHz, 6.115 ins/cycle
    57.791 ins/char, 9.450 cycle/char, 0.550 Gc/s (0.2 %) 1.78 byte/char
    convert_utf8_to_utf16_with_dynamic_allocation+icelake, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt
    1.660 ins/byte, 0.526 cycle/byte, 9.919 GB/s (0.6 %), 5.220 GHz, 3.155 ins/cycle
    2.964 ins/char, 0.939 cycle/char, 5.557 Gc/s (0.6 %) 1.78 byte/char
    convert_utf8_to_utf16_with_errors+icelake, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt
    1.403 ins/byte, 0.435 cycle/byte, 12.009 GB/s (0.5 %), 5.225 GHz, 3.225 ins/cycle
    2.505 ins/char, 0.777 cycle/char, 6.728 Gc/s (0.5 %) 1.78 byte/char