Re: [PATCH] kbuild: pass jobserver to cmd_ld_vmlinux.o

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-06-18, Masahiro Yamada wrote:
(+LLVM list, Fangrui Song)

Thanks for tagging me. I'll clarify some stuff.

On Fri, Jun 17, 2022 at 7:41 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:

On Fri, Jun 17, 2022 at 12:35 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
>
> On Fri, Jun 17, 2022 at 12:53 AM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >
> > On Thu, Jun 16, 2022 at 4:09 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 16, 2022 at 12:45 PM Jiri Slaby <jslaby@xxxxxxx> wrote:
> > > >
> > > > Until the link-vmlinux.sh split (cf. the commit below), the linker was
> > > > run with jobserver set in MAKEFLAGS. After the split, the command in
> > > > Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
> > > > is lost.
> > > >
> > > > Restore it as linkers working in parallel (esp. the LTO ones) make a use
> > > > of i

Hi Jiri,

Please let me clarify first.

Here, is it OK to assume you are talking about Clang LTO
instead of GCC LTO because the latter is not upstreamed ?





I tested this patch but I did not see any performance change for Clang LTO.


[1] CONFIG_CLANG_LTO_FULL

  lld always runs sequential.
  It never runs in parallel even if you pass -j option to Make

"lld always runs sequential" is not accurate. There are a number of
parallel linker passes.  ld.lld --threads= defaults to
llvm::hardware_concurrency (similar to
https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency,
but uses sched_getaffinity to compute the number of available cores).

"lld always runs sequential" is only correct only when --threads=1 is
specified or the system only provides one thread to the lld process.

I think people may be more interested in LTO parallelism here.  Regular
LTO (sometimes called full LTO when there is mixed-thin-and-regular LTO)
supports limited parallelism which applies to code generation, but not
IR-level optimization.  (IR-level optimization has many interprocedural
optimizations passes.  Splitting will make LTO less effective. Code
generation is per function, so parallelism does not regress
optimization.)


[2] CONFIG_CLANG_LTO_THIN

  lld always runs in parallel even if you do not pass -j option

  In my machine, lld always allocated 12 threads.
  This is irrespective of the Make parallelisms.




One more thing, if a program wants to participate in
Make's jobserver, it must parse MAKEFLAGS, and extract
file descriptors to be used to communicate to the jobserver.

As a code example in the kernel tree,
scripts/jobserver-exec parses "MAKEFLAGS" and "--jobserver".


I grepped the lld source code, but it does not contain
"MAKEFLAGS" or "jobserver".

masahiro@oscar:~/ref/lld$ git remote  show origin
* remote origin
 Fetch URL: https://github.com/llvm-mirror/lld.git
 Push  URL: https://github.com/llvm-mirror/lld.git
 HEAD branch: master
 Remote branches:
   master     tracked
   release_36 tracked
   release_37 tracked
   release_38 tracked
   release_39 tracked
   release_40 tracked
   release_50 tracked
   release_60 tracked
   release_70 tracked
   release_80 tracked
   release_90 tracked
 Local branch configured for 'git pull':
   master merges with remote master
 Local ref configured for 'git push':
   master pushes to master (up to date)
masahiro@oscar:~/ref/lld$ git grep MAKEFLAGS
masahiro@oscar:~/ref/lld$ git grep jobserver


So, in my research, LLD does not seem to support the jobserver.


Correct. lld does not support GNU make's jobserver.  On the other hand,
I don't think the jobserver implementation supports flexible "give this
target N hardware concurrency". A heavy link target does not necessarily
get more resources than a quick target.

If a make target knows how many hardware concurrency it gets, we can
pass --threads= to lld. LTO easily takes 95+% link time, so LTO
parallelism may needs a dedicated setting. lld has --thinlto-jobs=.




If you are talking about GCC LTO, yes, the code
tries to parse "--jobserver-auth=" from the MAKEFLAGS
environment variable.  [1]

[1]:  https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.1.0/gcc/lto-wrapper.cc#L1341


But, as you may know, GCC LTO works in a different way,
at least, we cannot do it before modpost.


--
Best Regards
Masahiro Yamada




[Index of Archives]     [Linux&nblp;USB Development]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite Secrets]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux