On Tue, May 16, 2023 at 7:54 AM Sami Tolvanen <samitolvanen@xxxxxxxxxx> wrote: > > On Mon, May 15, 2023 at 2:39 PM Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > > > > On Mon, 15 May 2023, Masahiro Yamada wrote: > > > > > When CONFIG_TRIM_UNUSED_KSYMS is enabled, Kbuild recursively traverses > > > the directory tree to determine which EXPORT_SYMBOL to trim. If an > > > EXPORT_SYMBOL turns out to be unused by anyone, Kbuild begins the > > > second traverse, where some source files are recompiled with their > > > EXPORT_SYMBOL() tuned into a no-op. > > > > > > Linus stated negative opinions about this slowness in commits: > > > > > > - 5cf0fd591f2e ("Kbuild: disable TRIM_UNUSED_KSYMS option") > > > - a555bdd0c58c ("Kbuild: enable TRIM_UNUSED_KSYMS again, with some guarding") > > > > > > We can do this better now. The final data structures of EXPORT_SYMBOL > > > are generated by the modpost stage, so modpost can selectively emit > > > KSYMTAB entries that are really used by modules. > > > > > > Commit 2cce989f8461 ("kbuild: unify two modpost invocations") is another > > > ground-work to do this in a one-pass algorithm. With the list of modules, > > > modpost sets sym->used if it is used by a module. modpost emits KSYMTAB > > > only for symbols with sym->used==true. > > > > > > BTW, Nicolas explained why the trimming was implemented with recursion: > > > > > > https://lore.kernel.org/all/2o2rpn97-79nq-p7s2-nq5-8p83391473r@xxxxxxxxxxx/ > > > > > > Actually, we never achieved that level of optimization where the chain > > > reaction of trimming comes into play because: > > > > > > - CONFIG_LTO_CLANG cannot remove any unused symbols > > > - CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is enabled only for vmlinux, > > > but not modules > > > > I did achieve it using LTO with gcc back then. See the section called > > "The tree that hides the forest" of https://lwn.net/Articles/746780/ for > > example results. > > Clang can do similar optimizations, but not in relocatable links where > the linker must obviously preserve all the globals. Yeah, the issue is not in the compiler itself but in the way CONFIG_LTO_CLANG was implemented. If it had been implemented in the final link stage, it would have required LTO running three times with CONFIG_KALLSYMS=y. But scripts/generate_initcall_order.pl would have been unneeded. And, maybe we would get slightly better vmlinux. I think the help message of CONFIG_LTO_CLANG_FULL is a misleading advertisement. We did not achieve such deeper trimming that is described in this link: https://llvm.org/docs/LinkTimeOptimization.html If I remember correctly, GCC LTO was implemented in the final link stage. So, trimming was depper but it ran three times. > A while ago there > was a suggestion of adding an option to LLD that allows one to pass a > list of symbols to preserve in relocatable LTO links, which would > allow us to better optimize vmlinux.o. However, I haven't had a chance > to look into this deeper than this proof of concept: > > https://reviews.llvm.org/D142163 Interesting. But, scripts/generate_initcall_order.pl is still needed, right? --lto-export-symbol-list is a list of symbols, but it does not specify the correct order? Nocolas explained the chain reaction of compiling modules with LTO, but I suspect it because modules are always relocatable ELF. The LWN article (https://lwn.net/Articles/746780/) is awesome but I think the benefit of LTO is for vmlinux, not for modules. > > > > If deeper trimming is required, we need to revisit this, but I guess > > > that is unlikely to happen. > > > > Would have been nicer to keep this possibility as an option. The code is > > already there and working as intended. The build cost is intrinsic to > > the approach of course. The actual bug is to impose that cost onto > > people who didn't explicitly ask for it. > > > > But I'm no longer fighting this battle. > > I agree, this looks like a reasonable solution for now. > > Sami -- Best Regards Masahiro Yamada