The N factor in -flto=N can apparently be ignored

"R. Diez via Gcc-help" <gcc-help@xxxxxxxxxxx> · Wed, 25 Oct 2023 08:19:56 +0000 (UTC)

Hi all:

I have built GCC 13.2 from sources for embedded development. The target is an ARM Cortex-M4F, the firmware binary is around 400 KB big, and the corresponding ELF file with debug information is about 8 MB big. Linking with LTO takes around 4 seconds.

I am using a combination of a top-level makefile, which provides the jobserver, and an Autoconf/Automake project, which does NOT generate the '+' makefile rule prefix you need in order to inherit the jobserver file descriptors, so using the jobserver normally does not work.

I am also extracting MAKEFLAGS options --jobserver-fds and --jobserver-auth, and manually reinjecting them in MAKEFLAGS again, because my top-level makefiles tend to use flags like --warn-undefined-variables which should not be passed down to other makefiles. This is due to some GNU Make limitations, which I hear would get better with an upcoming release.

To sum it up, I have the perfect combination for all kinds of hard-to-debug jobserver trouble.

The new GNU Make version 4.4.1 is no longer using inherited file descriptors for the jobserver by default, but named pipes. That should make it possible to use the jobserver in Autoconf/Automake projects. So I thought I should check whether that is actually working properly.

It turns out that it is hard to know whether GCC's LTO is using the jobserver or not. I couldn't find any verbose option where I could see it.

If you specify "-flto=jobserver", you do get a warning if the jobserver was not found, but if you specify "-flto" or "-flto=auto", you do not know. Using a jobserver is optional (the user decides), so you shouldn't always specify "-flto=jobserver". If GCC does not find the jobserver, and you are building several LTO projects in parallel, you could overload your PC and trigger the Linux out-of-memory killer.

The GCC manual is not very clear in this respect, but I am guessing that specifying "-flto" will use the jobserver if available, and then probably in a multithread or multiprocess fashion, and if no jobserver is found, GCC will then use a single thread or a single process. Otherwise, there would be no difference between "-flto" and "-flto=auto".

However, LTO linking always took 4 seconds, and always used several CPUs, regardless of jobserver presence. It did not max out all 12 cores, but it is around 80 % CPU load.

I then specified -flto=1, and to my surprise, GCC's behaviour was the same. I also tried -flto=2 and -flto=12 to no avail. I tried -flto=1 with no jobserver at all (no make -j flag), and still the same.

I thought that GCC was just ignoring the value passed to -flto in all situations. But then I rebuilt everything (not just linking), and I got this warning:

lto-wrapper: warning: using serial compilation of 10 LTRANS jobs
lto-wrapper: note: see the '-flto' option documentation for more information

And then linking was much slower than usual, as expected.

After some more testing, and looking at the compilation flags in the build output, I suspect now the following: if you compile your object files with -flto=jobserver , then linking with LTO will always use several threads. It does not matter whether a jobserver is not available, or whether -flto=1 is specified during linking.

Can someone else confirm this suspicion?

Thanks in advance,
  rdiez