5th January 2023, 9 min read

Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake

12 thoughts on “Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake”

Alex says:

January 5, 2023 at 11:07 pm

It’s worth noting that Zen 4 implements AVX-512 by splitting execution into two 256 bit stages, so instructions take twice more cycles (at least those that are 1-cycle on Intel, for complex instructions the difference is less than 2x, and in fact Zen 4 has powerful shuffling units, IIRC).
1. Daniel Lemire says:
  
  January 6, 2023 at 4:18 am
  
  Evidently, this does not seem to affect the performance negatively in a significant manner, at least in these tests. Note that we make extensive use of AVX-512.
2. Goran Mitrovic says:
  
  January 6, 2023 at 1:00 pm
  
  That is not true. It uses two 256-bit units (if available), but it takes only a single amount of cycles.
David says:

January 6, 2023 at 2:27 am

Intel made their compilers available at no cost some time back. They can be found at https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html .
1. Daniel Lemire says:
  
  January 6, 2023 at 4:17 am
  
  That’s interesting. I gave up on Intel compilers some time ago because it was tiring to manage the licensing. It seems like great news that they have simplified the process.
  1. Joe Duarte says:
    
    January 6, 2023 at 5:25 pm
    
    Intel switched to LLVM two years ago for their main C/C++ compiler, but they still release and update their Compiler Classic, based on their own compiler internals: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Release_history
    
    They seem to support a lot more optimization and acceleration than competitors, though the branding for the features is hard to keep track of. They might be the only compiler to support the new matrix instructions (AMX), and have lots of support for OpenMP up through 5.x, and their libraries like Threaded Building Blocks and SYCL/OpenCL support.
    
    But it’s hard to keep track of their branding, products, and features. A lot of it is under “OneAPI” now. I think OneAPI is meant to include their compiler, but I’m not sure.
    
    Have you looked at the matrix instructions?
    1. Daniel Lemire says:
      
      January 12, 2023 at 3:21 pm
      
      I have not yet had access to a processor with AMX instructions.
SubOptimal says:

January 6, 2023 at 10:18 am

Only a tiny adjustment to the instructions.

# will download the file as d2cIxRx
wget https://cutt.ly/d2cIxRx

# will download the file as Arabic-Lipsum.utf8.txt
wget –content-disposition https://cutt.ly/d2cIxRx
1. Daniel Lemire says:
  
  January 6, 2023 at 1:00 pm
  
  Thanks!
Alex says:

January 14, 2023 at 9:42 pm

Is there any reason for maintaining “a database formatted with UTF-16”? I had thought that the only use for UTF-16 in the modern age is for legacy operating system interfaces.
1. Daniel Lemire says:
  
  January 14, 2023 at 10:19 pm
  
  Last I checked, SQL Server defaulted on UTF-16. It is possible to use UTF-8 with recent versions, but it wasn’t the default when I last looked into it.
Anton Ertl says:

March 16, 2023 at 8:45 am

Results from a Xeon W-1370P (Rocket Lake); I don’t know which ones you used, so I provide all those that are UTF-8->UTF16 with icelake or iconv:

convert_utf8_to_utf16+icelake, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt 1.403 ins/byte, 0.440 cycle/byte, 11.871 GB/s (0.3 %), 5.224 GHz, 3.189 ins/cycle 2.505 ins/char, 0.785 cycle/char, 6.651 Gc/s (0.3 %) 1.78 byte/char convert_utf8_to_utf16+iconv, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt 32.378 ins/byte, 5.294 cycle/byte, 0.983 GB/s (0.2 %), 5.202 GHz, 6.115 ins/cycle 57.791 ins/char, 9.450 cycle/char, 0.550 Gc/s (0.2 %) 1.78 byte/char convert_utf8_to_utf16_with_dynamic_allocation+icelake, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt 1.660 ins/byte, 0.526 cycle/byte, 9.919 GB/s (0.6 %), 5.220 GHz, 3.155 ins/cycle 2.964 ins/char, 0.939 cycle/char, 5.557 Gc/s (0.6 %) 1.78 byte/char convert_utf8_to_utf16_with_errors+icelake, input size: 81685, iterations: 3000, dataset: Arabic-Lipsum.utf8.txt 1.403 ins/byte, 0.435 cycle/byte, 12.009 GB/s (0.5 %), 5.225 GHz, 3.225 ins/cycle 2.505 ins/char, 0.777 cycle/char, 6.728 Gc/s (0.5 %) 1.78 byte/char