Daniel Lemire's blog

, 18 min read

It is more complicated than I thought: -mtune, -march in GCC

17 thoughts on “It is more complicated than I thought: -mtune, -march in GCC”

  1. Sounds like it’s an architecture identification bug. If you can replicate it with gcc-8.1 (or even better, the Git HEAD), report it on GCC’s bug tracker: https://gcc.gnu.org/bugzilla/

    1. Travis Downs says:

      It’s not a bug per se, because it happens when GCC is too old to know about the new arch. So it doesn’t happen (for Skylake) on newer GCC, but it would presumabley still happen with a newer CPU uarch.

  2. GeorgeL says:

    Maybe it depends on your operating system and GCC version. On CentOS 7.5 with native GCC 4.8.5 and even with GCC 8.2 RC setting march=native also means mtune=native is set

    On Core i7 4790K cpu

    with GCC 4.8.5 native

    gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
    Target: x86_64-redhat-linux
    Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
    Thread model: posix
    gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

    you get for march and mtune

    gcc -march=native -Q --help=target | egrep -- '-march=|-mtune' | cut -f3
    core-avx2
    core-avx2

    with GCC 8.2 RC snapshot reported as 8.1.1 right now

    gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/opt/gcc-8.2.0-RC-20180719/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
    Target: x86_64-redhat-linux
    Configured with: ../configure --prefix=/opt/gcc-8.2.0-RC-20180719 --disable-multilib --enable-bootstrap --enable-plugin --with-gcc-major-version-only --enable-shared --disable-nls --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-install-libiberty --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++ --enable-initfini-array --disable-libgcj --enable-gnu-indirect-function --with-tune=generic --build=x86_64-redhat-linux --enable-lto --enable-gold
    Thread model: posix
    gcc version 8.1.1 20180719 (GCC

    you get for march and mtune

    gcc -march=native -Q --help=target | egrep -- '-march=|-mtune' | cut -f3
    haswell
    haswell

    and specifically for haswell target you get for march and mtune

    gcc -march=haswell -Q --help=target | egrep -- '-march=|-mtune' | cut -f3
    haswell
    haswell

    1. Travis Downs says:

      You need to run the test with a compiler that doesn’t know about your arch to make this interesting. In particular, for gcc 8 your results are as expected: Haswell is known by gcc and you are running on Haswell, so you get march and mtune set to Haswell.

      For the gcc 4.8.5 test, it isn’t clear what it means: core-avx2 is no longer a supported option for gcc (at least according to the manual): it reminds me of the icc options? It doesn’t make sense to tune for “core-avx2” since that is not an micro-architecture, so it’s hard to say what gcc is doing internally. Perhaps this behavior changed in later versions of gcc.

      1. GeorgeL says:

        For the gcc 4.8.5 test, it isn’t clear what it means: core-avx2 is no
        longer a supported option for gcc (at least according to the manual):
        it reminds me of the icc options? It doesn’t make sense to tune for
        “core-avx2” since that is not an micro-architecture, so it’s hard to
        say what gcc is doing internally. Perhaps this behavior changed in
        later versions of gcc.

        Ah didn’t realise core-avx2 was no longer supported. Probably explains why i had issues compiling PHP 7.3 alphas – on Skylake cpu failed to compile with Zend Opcache on GCC 4.8.5 but compiled fine on GCC 7.3.1 🙂

  3. Travis Downs says:

    A note about the gcc documentation you mentioned:

    Specifying -march=cpu-type implies -mtune=cpu-type.

    It could be clearer: what it should say is that “Specifying -march=cpu-type implies -mtune=cpu-type if not otherwise explicitly specified.” I had always interpreted it that way, but probably because before reading it I had seen lots of examples where both are specified (indeed, the documentation hints at that usage).

    That is, it has always been the case that passing both -march and -mtune to the same compilation makes sense: you often want to target some fairly broad range of chips (say, since Sandy Bridge) but optimize for the chip you know will be the most common in your case in the immediate future (say Skylake).

    You can see some method to gcc’s madness here. When you specify that gcc should use instructions and tuning for your arch, but you run into a problem when the arch is newer than gcc knows. In that case, what gcc does is different for the “march” side of things versus the “mtune”.

    For the march, you are just talking about available instructions and instruction sets. Any version of GCC knows about some set of instruction sets, usually corresponding to the newest arch it knows about. It can also query the instruction sets supported by the current CPU. If it as unknown type, it could match it against the arches it knows about and if there is an exact match or a “superset match” it could just use that – and so it does: it selects Broadwell since from an ISA point of view, Skylake is Broadwell (Skylake may support a few extra instructions such as MPX, but since gcc doesn’t know about them, it wouldn’t query for them and so this logic probably gets the same result whether it is using exact match or superset match).

    Another way of looking at it is that -march=broadwell is just a shortcut for specifying a long list of -m options like -mavx, -mavx2, -mpclmul, etc, and the same list can be generated for -march=native by querying the processor’s capabilities, which may then be compressed to something like -march=broadwell if it matches the list implied by Broadwell.

    All this is good because it prevents a huge regression when using -march=native: if it didn’t do this when you upgraded your CPU you’d suddenly lose access to AVX2, AVX, any version of SSE greater than 2 and so on, since gcc would just be like “Oh, I don’t know about this CPU so I’ll use the based x86-64 profile”. So I think we can say gcc is doing a reasonable thing on the -march side of things.

    That leaves -mtune. The main problem as you put is that -march=native implies (for example) -mtune=broadwell on Skylake chips when gcc doesn’t know about Skylake, but it does not imply -mtune=broadwell. In fact, in this particular case, -mtune=broadwell would be the best option: -mtune=generic is worse.

    We know that, however, only with the benefit of hindsight: Skylake performs very much like Broadwell (which performs essentially identical to Haswell before it), so Broadwell is a good tune for Skylake. That certainly hasn’t always been the case though: when the switch to the P4 uarch was made, the tune for the “previous” arch would have been a bad match for P4, and same when P4 was in turn dropped in favor of a return to the PPro/PentiumM architecture.

    So the rule of “use the latest arch (from same manufacturer?)” would have worked well recently but not in the past. It would also have trouble when some manufacturer doesn’t have a linear list of architectures, but rather also has various secondary archictectures, like Intel with Atom and the Phi/Knights* stuff.

    The rule of “use generic tune” seems like a reasonable compromise, and also has the advantage of being easier to implement: no need to implement an ordering of architectures or deal with the various families etc. So even though I originally thought this was really dumb, I can see the logic.

    Last note. You write:

    By default, when unspecified, “-mtune=generic” applies which means…

    I think you know this, but one should be clear that this only applies if you don’t also specify -march. Usually you want to specific -march since the difference there is huge: newer instruction sets, and -mtune comes along for the side.

  4. Travis Downs says:

    I hate no editing capabilities, and this typo is too important: it should read:

    The main problem as you put is that -march=native implies (for
    example) -march=broadwell on Skylake chips when gcc doesn’t know about
    Skylake, but it does not imply -mtune=broadwell

  5. Thanks. This is an appropriate and timely bit of information, given my upcoming exercise. 🙂

    I can somewhat understand the choice of compiler-default behaviors, but also expect it might wander a bit between versions. This should not matter for most folk, for most problems, but if you are working a problem targeted for a specific processor, this stuff matters.

  6. For the longest time, a codebase I worked on had -march=native -mtune=native. It was just easier to let GCC figure things out instead of specifying the actual values, and it worked, so why bother?

    But it does. And this article is a great link to share with people who don’t know that.

    The reason I had to change the code base was virtual machines. Some of the build was being done in a QEMU VM, so the CPU returned from procinfo was a QEMU. This broke the build entirely, since GCC couldn’t figure out what the CPU architecture was. But if it hadn’t been for that, I would not have been aware of the issues with -march=native -mtune=native. So thank you for writing the article to bring this to more people’s attention.

  7. gcc-8.2 fixes the Skylake identification bug: https://www.phoronix.com/scan.php?page=news_item&px=GCC-8.2-Relased

  8. me says:

    If the compiler does not know the actual architecture – you mentioned that broadwell is not correct, just close enough – how is it going to know that tuning for broadwell is more appropriate than tuning generic? Because apparently it is not a broadwell.

    It seems consistent to me apply generic tuning for a CPU that the compiler does not (yet) have enough details. It cannot just assume that broadwell tuning is the best choice for all future broadwell successor CPUs.

    1. It seems consistent to me apply generic tuning for a CPU that the compiler does not (yet) have enough details.

      It is not wrong, but I would argue that it is not possible to infer this behaviour from the documentation. So the net result is a surprise, and surprises are not good.

  9. Quentin N says:

    One of the longest running threads in compiler development, this is a great post with the key question asked, some valuable introspection tools, and the general state of things explained

    The two key discussions are 1) march is generally incrementally inclusive across processor models/capabilities, and 2) the tools themselves adapt over time to the available models.

    Worth noting that the underlying tools (assembler, linker) can be sensitive to these variables.

    I wish gcc and clang would both auto-generate docs to show the tune/arch/HW (and if dependent on the OS) decision tree. Maybe I need to pony up some open source development effort…

  10. Aaron Max Fein says:

    Great thread indeed, very cool to get a better grip on this… was making the same assumptions and occasionally wondered about it… 🙂

  11. Martin Guttman says:

    I find it to be more of a documentation broad wording issue and not a bug per se. Where it says :

    Specifying -march=cpu-type implies -mtune=cpu-type

    It exactly means cpu-type, not attribute-option. Since native it’s not a cpu-type but rather a compiler instruction to try to match the current architecture, it does not cascade to the -mtune option, and is well within the wording. The confusing wording, but correct one.

    1. I am not sure I ever believed it was a bug. It is just complicated.

      1. Mingye Wang says:

        For what is worth, on godbolt’s x86-64 gcc 13.2, “-march=native –help=target -Q” now gives whatever CPU the server happens to be using in “-mtune”. Using the available versions I found that GCC 7.2 gives generic mtune, but GCC 7.3 does native. I am a bit too lazy to find the commit for now.