Daniel Lemire's blog

, 12 min read

The surprising cleverness of modern compilers

18 thoughts on “The surprising cleverness of modern compilers”

  1. KWillets says:

    Can confirm the same on OS X’s clang-600.0.54.

  2. KWillets says:

    Aha: https://llvm.org/bugs/show_bug.cgi?id=1488

    It’s been in there for a while apparently.

  3. Stefan says:

    Note, however, that this optimization in LLVM is highly-specialized, and it pattern-matches against the exact implementation you gave and turns it into a popcnt. There is no magic going on, and any number of semantics-preserving rewrites of the C code will cause LLVM to miss the optimization, because the pattern isn’t recognized any longer.

  4. Oleg says:

    The one that caught me off guard while trying to get some O(n) loops to benchmark, automatic conversion of sum(i^n) to polynom(n): https://godbolt.org/g/p6RxBh

  5. Anonymous says:

    I probably wouldn’t understand an in-depth explanation, but I would still like to know – at least on superficial level – how clang does this. It seems to me that an extremely sophisticated algorithm would be needed to achieve this.

    What if you change the code slightly? Like change x != 0 to x != 1 or something.

    1. Corralx says:

      It does that by transforming the AST in a normal form and then performing some pattern matching on that, trying to find common subsection of codes it can optimize. It can work wonderfully but all these patterns are handcrafted and to be really useful they often require the programmer to know how to write the code the make it easy for the compiler to recognize them.

    2. Julian Squires says:

      You might find this section of the LLVM code interesting:
      https://github.com/llvm-mirror/llvm/blob/f36485f7ac2a8d72ad0e0f2134c17fd365272285/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L960

      All the idiom recognition stuff is pretty interesting.

    3. Peter Cordes says:

      compilers look for certain patterns in the logic of what the code does.

      A simpler example of this is rotate instructions, where the common idiom is x>>3 | x<<(32-3)

      Compilers have recognized that for a long time as a 32bit rotate instruction.

      (There are caveats, though, esp for variable shift-counts: http://stackoverflow.com/questions/776508/best-practices-for-circular-shift-rotate-operations-in-c)

  6. Paul D. says:

    You might be interested in seeing what’s happening with synthesizing superoptimizers, exploiting the big advances that have been made in SAT solvers.

    http://blog.regehr.org/archives/1219

  7. JP says:

    does it do the same for any other standard pop count algorithms?

    1. bsa says:

      Current versions of clang and gcc will also optimize this (the K&R implementation for counting bits) to popcount. I tested with the Godbolt compiler explorer):

      static inline size_t _count_bits(uint64_t n) {
      size_t count;
      for (count = 0; n; count++) n &= n - 1;
      return count;
      }

      Another example is this code, which performs a bitwise left shift with wrap (unlike the << operator), and is optimized to either a rotl or rotr instruction depending on the shift size.

      static inline uint64_t rotl(const uint64_t x, int k) {
      return (x << k) | (x >> (64 - k));
      }

      I really wish there were a solid cross-platform library for bitwise operations like these which resolve to native instructions where possible, and fall back to decent software implementations on architectures where those instructions aren’t present. Maybe there is and I don’t know what it’s called, or there is a bigger picture I’m missing.

  8. Derek Jones says:

    x &= x – 1 is the first example in Waren’s book “Hacker’s delight”.

    I think one measure of optimization quality is producing the same code for all programs that do the same thing:
    http://shape-of-code.coding-guidelines.com/2011/07/24/compiler-benchmarking-for-the-21st-century/

  9. Sigi says:

    That’s not magic when you have a basic understanding on hamming weight.

    C is a general-purpose language. You can use it as you like.

    Disable all compiler optimization and than look at the asm code.

  10. Some cool compiler optimization stuff going on with Spark recently:

    https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/

  11. Chagor Lango says:

    And how many CPU cycles does it take for popcnt to complete? Compiler might save on program size, but you’re ultimately dependent on microcode for non-trivial instructions. That’s not magic either.

    1. popcnt has a throughput of 1 instruction every cycle on recent x64 processors.

  12. > What does that mean? It means that C is a high-level language.

    No. This means that C has high-level compilers, if anything.

    They happen to recognize the pattern (or patterns) in the code outlined as special cases for optimization, and finally manage to optimize it into a single assembly instruction.

    C is still as low level as it was.