23rd May 2016, 12 min read

The surprising cleverness of modern compilers

18 thoughts on “The surprising cleverness of modern compilers”

KWillets says:

May 23, 2016 at 10:41 pm

Can confirm the same on OS X’s clang-600.0.54.
KWillets says:

May 24, 2016 at 1:47 am

Aha: https://llvm.org/bugs/show_bug.cgi?id=1488

It’s been in there for a while apparently.
Stefan says:

May 24, 2016 at 3:06 am

Note, however, that this optimization in LLVM is highly-specialized, and it pattern-matches against the exact implementation you gave and turns it into a popcnt. There is no magic going on, and any number of semantics-preserving rewrites of the C code will cause LLVM to miss the optimization, because the pattern isn’t recognized any longer.
Oleg says:

May 24, 2016 at 7:25 am

The one that caught me off guard while trying to get some O(n) loops to benchmark, automatic conversion of sum(i^n) to polynom(n): https://godbolt.org/g/p6RxBh
Anonymous says:

May 24, 2016 at 9:23 am

I probably wouldn’t understand an in-depth explanation, but I would still like to know – at least on superficial level – how clang does this. It seems to me that an extremely sophisticated algorithm would be needed to achieve this.

What if you change the code slightly? Like change x != 0 to x != 1 or something.
1. Corralx says:
  
  May 24, 2016 at 1:06 pm
  
  It does that by transforming the AST in a normal form and then performing some pattern matching on that, trying to find common subsection of codes it can optimize. It can work wonderfully but all these patterns are handcrafted and to be really useful they often require the programmer to know how to write the code the make it easy for the compiler to recognize them.
2. Julian Squires says:
  
  May 24, 2016 at 1:27 pm
  
  You might find this section of the LLVM code interesting:
  https://github.com/llvm-mirror/llvm/blob/f36485f7ac2a8d72ad0e0f2134c17fd365272285/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L960
  
  All the idiom recognition stuff is pretty interesting.
3. Peter Cordes says:
  
  May 24, 2016 at 2:53 pm
  
  compilers look for certain patterns in the logic of what the code does.
  
  A simpler example of this is rotate instructions, where the common idiom is x>>3 | x<<(32-3)
  
  Compilers have recognized that for a long time as a 32bit rotate instruction.
  
  (There are caveats, though, esp for variable shift-counts: http://stackoverflow.com/questions/776508/best-practices-for-circular-shift-rotate-operations-in-c)
Paul D. says:

May 24, 2016 at 11:32 am

You might be interested in seeing what’s happening with synthesizing superoptimizers, exploiting the big advances that have been made in SAT solvers.

http://blog.regehr.org/archives/1219
JP says:

May 24, 2016 at 12:28 pm

does it do the same for any other standard pop count algorithms?
1. bsa says:
  
  September 1, 2019 at 9:51 am
  
  Current versions of clang and gcc will also optimize this (the K&R implementation for counting bits) to popcount. I tested with the Godbolt compiler explorer):
  
  static inline size_t _count_bits(uint64_t n) { size_t count; for (count = 0; n; count++) n &= n - 1; return count; }
  
  Another example is this code, which performs a bitwise left shift with wrap (unlike the << operator), and is optimized to either a rotl or rotr instruction depending on the shift size.
  
  static inline uint64_t rotl(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); }
  
  I really wish there were a solid cross-platform library for bitwise operations like these which resolve to native instructions where possible, and fall back to decent software implementations on architectures where those instructions aren’t present. Maybe there is and I don’t know what it’s called, or there is a bigger picture I’m missing.
Francois Saint-Jacques says:

May 24, 2016 at 1:26 pm

@Anonymous, the code is here:

https://github.com/llvm-mirror/llvm/blob/ae889d36724efb174e8cc05d26e655c4c4ab8867/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L1077-L1123
Derek Jones says:

May 24, 2016 at 1:48 pm

x &= x – 1 is the first example in Waren’s book “Hacker’s delight”.

I think one measure of optimization quality is producing the same code for all programs that do the same thing:
http://shape-of-code.coding-guidelines.com/2011/07/24/compiler-benchmarking-for-the-21st-century/
Sigi says:

May 24, 2016 at 7:03 pm

That’s not magic when you have a basic understanding on hamming weight.

C is a general-purpose language. You can use it as you like.

Disable all compiler optimization and than look at the asm code.
Franklin Chen says:

May 24, 2016 at 9:23 pm

Some cool compiler optimization stuff going on with Spark recently:

https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/
Chagor Lango says:

May 26, 2016 at 8:55 am

And how many CPU cycles does it take for popcnt to complete? Compiler might save on program size, but you’re ultimately dependent on microcode for non-trivial instructions. That’s not magic either.
1. Daniel Lemire says:
  
  May 26, 2016 at 1:13 pm
  
  popcnt has a throughput of 1 instruction every cycle on recent x64 processors.
MichaÅ‚ "phoe" Herda says:

August 26, 2017 at 10:13 am

> What does that mean? It means that C is a high-level language.

No. This means that C has high-level compilers, if anything.

They happen to recognize the pattern (or patterns) in the code outlined as special cases for optimization, and finally manage to optimize it into a single assembly instruction.

C is still as low level as it was.