25th July 2018, 18 min read

It is more complicated than I thought: -mtune, -march in GCC

17 thoughts on “It is more complicated than I thought: -mtune, -march in GCC”

stefantalpalaru says:

July 25, 2018 at 8:49 pm

Sounds like it’s an architecture identification bug. If you can replicate it with gcc-8.1 (or even better, the Git HEAD), report it on GCC’s bug tracker: https://gcc.gnu.org/bugzilla/
1. Travis Downs says:
  
  July 28, 2018 at 5:36 pm
  
  It’s not a bug per se, because it happens when GCC is too old to know about the new arch. So it doesn’t happen (for Skylake) on newer GCC, but it would presumabley still happen with a newer CPU uarch.
GeorgeL says:

July 25, 2018 at 10:01 pm

Maybe it depends on your operating system and GCC version. On CentOS 7.5 with native GCC 4.8.5 and even with GCC 8.2 RC setting march=native also means mtune=native is set

On Core i7 4790K cpu

with GCC 4.8.5 native

gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

you get for march and mtune

gcc -march=native -Q --help=target | egrep -- '-march=|-mtune' | cut -f3 core-avx2 core-avx2

with GCC 8.2 RC snapshot reported as 8.1.1 right now

gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/opt/gcc-8.2.0-RC-20180719/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/opt/gcc-8.2.0-RC-20180719 --disable-multilib --enable-bootstrap --enable-plugin --with-gcc-major-version-only --enable-shared --disable-nls --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-install-libiberty --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++ --enable-initfini-array --disable-libgcj --enable-gnu-indirect-function --with-tune=generic --build=x86_64-redhat-linux --enable-lto --enable-gold Thread model: posix gcc version 8.1.1 20180719 (GCC

you get for march and mtune

gcc -march=native -Q --help=target | egrep -- '-march=|-mtune' | cut -f3 haswell haswell

and specifically for haswell target you get for march and mtune

gcc -march=haswell -Q --help=target | egrep -- '-march=|-mtune' | cut -f3 haswell haswell
1. Travis Downs says:
  
  July 26, 2018 at 3:20 am
  
  You need to run the test with a compiler that doesn’t know about your arch to make this interesting. In particular, for gcc 8 your results are as expected: Haswell is known by gcc and you are running on Haswell, so you get march and mtune set to Haswell.
  
  For the gcc 4.8.5 test, it isn’t clear what it means: core-avx2 is no longer a supported option for gcc (at least according to the manual): it reminds me of the icc options? It doesn’t make sense to tune for “core-avx2” since that is not an micro-architecture, so it’s hard to say what gcc is doing internally. Perhaps this behavior changed in later versions of gcc.
  1. GeorgeL says:
    
    July 26, 2018 at 9:24 am
    
    For the gcc 4.8.5 test, it isn’t clear what it means: core-avx2 is no
    longer a supported option for gcc (at least according to the manual):
    it reminds me of the icc options? It doesn’t make sense to tune for
    â€œcore-avx2â€ since that is not an micro-architecture, so it’s hard to
    say what gcc is doing internally. Perhaps this behavior changed in
    later versions of gcc.
    
    Ah didn’t realise core-avx2 was no longer supported. Probably explains why i had issues compiling PHP 7.3 alphas – on Skylake cpu failed to compile with Zend Opcache on GCC 4.8.5 but compiled fine on GCC 7.3.1 🙂
Travis Downs says:

July 25, 2018 at 10:27 pm

A note about the gcc documentation you mentioned:

Specifying -march=cpu-type implies -mtune=cpu-type.

It could be clearer: what it should say is that “Specifying -march=cpu-type implies -mtune=cpu-type if not otherwise explicitly specified.” I had always interpreted it that way, but probably because before reading it I had seen lots of examples where both are specified (indeed, the documentation hints at that usage).

That is, it has always been the case that passing both -march and -mtune to the same compilation makes sense: you often want to target some fairly broad range of chips (say, since Sandy Bridge) but optimize for the chip you know will be the most common in your case in the immediate future (say Skylake).

You can see some method to gcc’s madness here. When you specify that gcc should use instructions and tuning for your arch, but you run into a problem when the arch is newer than gcc knows. In that case, what gcc does is different for the “march” side of things versus the “mtune”.

For the march, you are just talking about available instructions and instruction sets. Any version of GCC knows about some set of instruction sets, usually corresponding to the newest arch it knows about. It can also query the instruction sets supported by the current CPU. If it as unknown type, it could match it against the arches it knows about and if there is an exact match or a “superset match” it could just use that – and so it does: it selects Broadwell since from an ISA point of view, Skylake is Broadwell (Skylake may support a few extra instructions such as MPX, but since gcc doesn’t know about them, it wouldn’t query for them and so this logic probably gets the same result whether it is using exact match or superset match).

Another way of looking at it is that -march=broadwell is just a shortcut for specifying a long list of -m options like -mavx, -mavx2, -mpclmul, etc, and the same list can be generated for -march=native by querying the processor’s capabilities, which may then be compressed to something like -march=broadwell if it matches the list implied by Broadwell.

All this is good because it prevents a huge regression when using -march=native: if it didn’t do this when you upgraded your CPU you’d suddenly lose access to AVX2, AVX, any version of SSE greater than 2 and so on, since gcc would just be like “Oh, I don’t know about this CPU so I’ll use the based x86-64 profile”. So I think we can say gcc is doing a reasonable thing on the -march side of things.

That leaves -mtune. The main problem as you put is that -march=native implies (for example) -mtune=broadwell on Skylake chips when gcc doesn’t know about Skylake, but it does not imply -mtune=broadwell. In fact, in this particular case, -mtune=broadwell would be the best option: -mtune=generic is worse.

We know that, however, only with the benefit of hindsight: Skylake performs very much like Broadwell (which performs essentially identical to Haswell before it), so Broadwell is a good tune for Skylake. That certainly hasn’t always been the case though: when the switch to the P4 uarch was made, the tune for the “previous” arch would have been a bad match for P4, and same when P4 was in turn dropped in favor of a return to the PPro/PentiumM architecture.

So the rule of “use the latest arch (from same manufacturer?)” would have worked well recently but not in the past. It would also have trouble when some manufacturer doesn’t have a linear list of architectures, but rather also has various secondary archictectures, like Intel with Atom and the Phi/Knights* stuff.

The rule of “use generic tune” seems like a reasonable compromise, and also has the advantage of being easier to implement: no need to implement an ordering of architectures or deal with the various families etc. So even though I originally thought this was really dumb, I can see the logic.

Last note. You write:

By default, when unspecified, â€œ-mtune=genericâ€ applies which means…

I think you know this, but one should be clear that this only applies if you don’t also specify -march. Usually you want to specific -march since the difference there is huge: newer instruction sets, and -mtune comes along for the side.
Travis Downs says:

July 25, 2018 at 10:30 pm

I hate no editing capabilities, and this typo is too important: it should read:

The main problem as you put is that -march=native implies (for
example) -march=broadwell on Skylake chips when gcc doesn’t know about
Skylake, but it does not imply -mtune=broadwell
Preston L. Bannister says:

July 26, 2018 at 12:26 am

Thanks. This is an appropriate and timely bit of information, given my upcoming exercise. 🙂

I can somewhat understand the choice of compiler-default behaviors, but also expect it might wander a bit between versions. This should not matter for most folk, for most problems, but if you are working a problem targeted for a specific processor, this stuff matters.
Shalom Craimer says:

July 26, 2018 at 8:10 am

For the longest time, a codebase I worked on had -march=native -mtune=native. It was just easier to let GCC figure things out instead of specifying the actual values, and it worked, so why bother?

But it does. And this article is a great link to share with people who don’t know that.

The reason I had to change the code base was virtual machines. Some of the build was being done in a QEMU VM, so the CPU returned from procinfo was a QEMU. This broke the build entirely, since GCC couldn’t figure out what the CPU architecture was. But if it hadn’t been for that, I would not have been aware of the issues with -march=native -mtune=native. So thank you for writing the article to bring this to more people’s attention.
stefantalpalaru says:

July 27, 2018 at 12:42 am

gcc-8.2 fixes the Skylake identification bug: https://www.phoronix.com/scan.php?page=news_item&px=GCC-8.2-Relased
me says:

July 29, 2018 at 7:45 am

If the compiler does not know the actual architecture – you mentioned that broadwell is not correct, just close enough – how is it going to know that tuning for broadwell is more appropriate than tuning generic? Because apparently it is not a broadwell.

It seems consistent to me apply generic tuning for a CPU that the compiler does not (yet) have enough details. It cannot just assume that broadwell tuning is the best choice for all future broadwell successor CPUs.
1. Daniel Lemire says:
  
  July 29, 2018 at 2:02 pm
  
  It seems consistent to me apply generic tuning for a CPU that the compiler does not (yet) have enough details.
  
  It is not wrong, but I would argue that it is not possible to infer this behaviour from the documentation. So the net result is a surprise, and surprises are not good.
Quentin N says:

October 16, 2018 at 11:54 pm

One of the longest running threads in compiler development, this is a great post with the key question asked, some valuable introspection tools, and the general state of things explained

The two key discussions are 1) march is generally incrementally inclusive across processor models/capabilities, and 2) the tools themselves adapt over time to the available models.

Worth noting that the underlying tools (assembler, linker) can be sensitive to these variables.

I wish gcc and clang would both auto-generate docs to show the tune/arch/HW (and if dependent on the OS) decision tree. Maybe I need to pony up some open source development effort…
Aaron Max Fein says:

April 24, 2019 at 11:29 pm

Great thread indeed, very cool to get a better grip on this… was making the same assumptions and occasionally wondered about it… 🙂
Martin Guttman says:

August 20, 2019 at 3:48 pm

I find it to be more of a documentation broad wording issue and not a bug per se. Where it says :

Specifying -march=cpu-type implies -mtune=cpu-type

It exactly means cpu-type, not attribute-option. Since native it’s not a cpu-type but rather a compiler instruction to try to match the current architecture, it does not cascade to the -mtune option, and is well within the wording. The confusing wording, but correct one.
1. Daniel Lemire says:
  
  August 20, 2019 at 5:01 pm
  
  I am not sure I ever believed it was a bug. It is just complicated.
  1. Mingye Wang says:
    
    August 26, 2023 at 1:34 pm
    
    For what is worth, on godbolt’s x86-64 gcc 13.2, “-march=native –help=target -Q” now gives whatever CPU the server happens to be using in “-mtune”. Using the available versions I found that GCC 7.2 gives generic mtune, but GCC 7.3 does native. I am a bit too lazy to find the commit for now.