17th January 2018, 11 min read

Ridiculously fast base64 encoding and decoding

13 thoughts on “Ridiculously fast base64 encoding and decoding”

Bingo Du says:

January 18, 2018 at 10:43 am

Wonderful results!
Translate says:

January 18, 2018 at 11:32 am

Thanks, inspiring article!
Matias N. Goldberg says:

January 18, 2018 at 11:42 pm

Great job!

But you should warn about the use of AVX2.
Unfortunately, the use of AVX2 severely throttles the CPU, which can cause system-wide performance issues as it affects other processes.

See https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

Nonetheless, great job pointing out it can be done better!
1. Daniel Lemire says:
  
  January 19, 2018 at 2:44 pm
  
  But you should warn about the use of AVX2.
  
  The paper is called: “Faster Base64 Encoding and Decoding Using AVX2 Instructions”.
  
  Unfortunately, the use of AVX2 severely throttles the CPU, which can cause system-wide performance issues
  
  Intel reduces the turbo frequency depending on the instruction mix. On Skylake X, AVX-512 instructions have a greater effect than AVX2 instructions with multiplications and floating points. Simple AVX2 instructions can be used without any reduction to the turbo frequency. The effect is tiny on processors having few active cores (e.g. 4), unlikely to be measurable, but it is larger on wide chips with many active cores (e.g., 28).
  
  If you have a chip with many active cores (much more than 4) and if you have a CPU heavy load, and if AVX-512 does not accelerate the computation much, then you can get a negative outcome. This is discussed in Intel’s optimization manual.
  
  The link you refer to is in this scenario, they have 24-core processors, with all cores active, and they use AVX-512 instructions.
  1. Travis Downs says:
    
    January 21, 2018 at 7:44 pm
    
    To be fair to the grandparent poster, the “normal” frequency of almost any recent Intel chip is totally irrelevant. The chip almost never runs at that speed. It’s almost always either “off” (in some non-zero C-state), running at minimum frequency (i.e,. most efficient freq, usually at Vmin around 500-1000Mhz), or running at maximum turbo frequency. Rarely you’ll find it running at other frequencies between min up to including normal, which usually happens during workload transition.
    
    “Normal” (the frequency printed on the box) isn’t at all special here in terms of how often that’s used your chip probably runs at “normal” frequency less than 1% of the time. If you want to know how fast your CPU will run something, the turbo frequency is essentially the only number you need to know (and the turbo ratio rable for multiple running cores, unfortunately).
    
    Intel puts it on the box, probably for historical reasons and because of the confusing aspect of the turbo ration depending on the number of running CPUs, so for my 4-core CPU they can either say “2.6 GHz” or “3.5/3.4/3.3/3.2 GHz”, and for a 28-core CPU, well…
    
    Intel also positions the normal frequency as a the “guaranteed” frequency, but in practice this has almost no meaning today: except in very small form factors or with very poor cooling you’ll generally run at the max turbo indefinitely, and if you get hot enough or draw too much current you can go below normal anyways, so essentially all frequencies are “if conditions permit”.
    
    Historically and still to some extent today, the normal frequency was important for the power management API the chip offers: they expose the ability to the OS to adjust the frequency between the min and normal frequencies, so normal was relevant there – for turbo speeds you had to let the hardware take control. Later on the chips offered more control over turbo rations too, but the interface (i.e., what MSRs you write and what you write) was totally different. These days the recommended mode of operation is “HWP” which is hardware performance management, essentially giving the CPU control over the whole frequency range (the P-states), so that distinction has most disapeared.
    
    I wanted to comment on the AVX2/AVX512 throttling too, since I think there is some misunderstanding above, but this is already long enough… 🙂
    
    I’m happy to add that part later if anyone is interested.
2. Daniel Lemire says:
  
  January 24, 2018 at 1:11 am
  
  Another question is… how certain are we that our software does not already use AVX instructions?
  1. Travis Downs says:
    
    January 24, 2018 at 2:06 am
    
    It is pretty easy to prevent the compiler from emitting AVX2 in code you are compiling with the appropriate compiler flags, but that’s only part of the story – you also have to check any third party libraries you use, especially the C and C++ standard libraries which almost everyone uses.
    
    The C library especially is almost always implemented with AVX2 for methods like memcpy, and you’ll often get these faster methods even if you didn’t compile with AVX2 flags (or even if you compiled before AVX2 existed) through the magic of runtime dispatch (including the runtime linker IFUNC magic).
    
    Finally, even interrupts or other processes running on the same CPU (including at the same time on the sibling hyperthread) might decide to use AVX2, slowing down your whole CPU (the interrupt case is admittedly a bit of a stretch!).
    1. Daniel Lemire says:
      
      January 24, 2018 at 2:12 am
      
      Right. Java certainly JIT compile code to use AVX if it detects that the processor supports it.
      1. Travis Downs says:
        
        January 24, 2018 at 2:34 am
        
        Exactly, which is one of the tick marks in the column for “how a runtime-interpreted language like Java can be faster than a native compiled language like C”. That is, it can use CPU instructions that weren’t even invented when the source was compiled!
        
        Alex says:
        
        January 30, 2018 at 1:32 pm
        
        Good point, that never occurred to me!
Mica says:

January 19, 2018 at 7:15 am

Hi Daniel

It would be really GREAT if you can make a “SIMD tutorial” for new comers.

As you said, there very little information about how to use SIMD in practice.

And please, if you decide to do so use “C” for simplicity 🙂

Best
Mica
1. Daniel Lemire says:
  
  January 19, 2018 at 10:09 pm
  
  I agree Mica.
Amit Dhingra says:

September 19, 2019 at 5:07 am

Hi Daniel,

Is sending images and video files in base64 format in the html file through webAPI is a good approach in comparison with sending html,images,videos all in a zip file ?