On Wed, Dec 2, 2020 at 6:37 AM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@xxxxxxxxxxxxxxxx> wrote: > > This change adds build system support for Clang's Link Time > Optimization (LTO). With -flto, instead of ELF object files, Clang > produces LLVM bitcode, which is compiled into native code at link > time, allowing the final binary to be optimized globally. For more > details, see: > > https://llvm.org/docs/LinkTimeOptimization.html > > The Kconfig option CONFIG_LTO_CLANG is implemented as a choice, > which defaults to LTO being disabled. To use LTO, the architecture > must select ARCH_SUPPORTS_LTO_CLANG and support: > > - compiling with Clang, > - compiling inline assembly with Clang's integrated assembler, > - and linking with LLD. > > While using full LTO results in the best runtime performance, the > compilation is not scalable in time or memory. CONFIG_THINLTO > enables ThinLTO, which allows parallel optimization and faster > incremental builds. ThinLTO is used by default if the architecture > also selects ARCH_SUPPORTS_THINLTO: > > https://clang.llvm.org/docs/ThinLTO.html > > To enable LTO, LLVM tools must be used to handle bitcode files. The > easiest way is to pass the LLVM=1 option to make: > > $ make LLVM=1 defconfig > $ scripts/config -e LTO_CLANG > $ make LLVM=1 > > Alternatively, at least the following LLVM tools must be used: > > CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm > > To prepare for LTO support with other compilers, common parts are > gated behind the CONFIG_LTO option, and LTO can be disabled for > specific files by filtering out CC_FLAGS_LTO. > > Signed-off-by: Sami Tolvanen <samitolvanen@xxxxxxxxxx> > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> > --- > Makefile | 19 ++++++- > arch/Kconfig | 88 +++++++++++++++++++++++++++++++ > include/asm-generic/vmlinux.lds.h | 11 ++-- > scripts/Makefile.build | 9 +++- > scripts/Makefile.modfinal | 9 +++- > scripts/Makefile.modpost | 21 +++++++- > scripts/link-vmlinux.sh | 32 ++++++++--- > 7 files changed, 171 insertions(+), 18 deletions(-) > > diff --git a/Makefile b/Makefile > index 16b7f0890e75..f5cac2428efc 100644 > --- a/Makefile > +++ b/Makefile > @@ -891,6 +891,21 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS) > export CC_FLAGS_SCS > endif > > +ifdef CONFIG_LTO_CLANG > +ifdef CONFIG_LTO_CLANG_THIN > +CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit > +KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache > +else > +CC_FLAGS_LTO += -flto > +endif > +CC_FLAGS_LTO += -fvisibility=default > +endif > + > +ifdef CONFIG_LTO > +KBUILD_CFLAGS += $(CC_FLAGS_LTO) > +export CC_FLAGS_LTO > +endif > + > ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B > KBUILD_CFLAGS += -falign-functions=32 > endif > @@ -1471,7 +1486,7 @@ MRPROPER_FILES += include/config include/generated \ > *.spec > > # Directories & files removed with 'make distclean' > -DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS > +DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache > > # clean - Delete most, but leave enough to build external modules > # > @@ -1717,7 +1732,7 @@ PHONY += compile_commands.json > > clean-dirs := $(KBUILD_EXTMOD) > clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \ > - $(KBUILD_EXTMOD)/compile_commands.json > + $(KBUILD_EXTMOD)/compile_commands.json $(KBUILD_EXTMOD)/.thinlto-cache > > PHONY += help > help: > diff --git a/arch/Kconfig b/arch/Kconfig > index 56b6ccc0e32d..30907b554451 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -598,6 +598,94 @@ config SHADOW_CALL_STACK > reading and writing arbitrary memory may be able to locate them > and hijack control flow by modifying the stacks. > > +config LTO > + bool > + help > + Selected if the kernel will be built using the compiler's LTO feature. > + > +config LTO_CLANG > + bool > + select LTO > + help > + Selected if the kernel will be built using Clang's LTO feature. > + > +config ARCH_SUPPORTS_LTO_CLANG > + bool > + help > + An architecture should select this option if it supports: > + - compiling with Clang, > + - compiling inline assembly with Clang's integrated assembler, > + - and linking with LLD. > + > +config ARCH_SUPPORTS_LTO_CLANG_THIN > + bool > + help > + An architecture should select this option if it can support Clang's > + ThinLTO mode. > + > +config HAS_LTO_CLANG > + def_bool y > + # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510 > + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD > + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm) > + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm) > + depends on ARCH_SUPPORTS_LTO_CLANG > + depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT > + depends on !KASAN > + depends on !GCOV_KERNEL > + depends on !MODVERSIONS > + help > + The compiler and Kconfig options support building with Clang's > + LTO. > + > +choice > + prompt "Link Time Optimization (LTO)" > + default LTO_NONE > + help > + This option enables Link Time Optimization (LTO), which allows the > + compiler to optimize binaries globally. > + > + If unsure, select LTO_NONE. Note that LTO is very resource-intensive > + so it's disabled by default. > + > +config LTO_NONE > + bool "None" > + help > + Build the kernel normally, without Link Time Optimization (LTO). > + > +config LTO_CLANG_FULL > + bool "Clang Full LTO (EXPERIMENTAL)" > + depends on HAS_LTO_CLANG > + select LTO_CLANG > + help > + This option enables Clang's full Link Time Optimization (LTO), which > + allows the compiler to optimize the kernel globally. If you enable > + this option, the compiler generates LLVM bitcode instead of ELF > + object files, and the actual compilation from bitcode happens at > + the LTO link step, which may take several minutes depending on the > + kernel configuration. More information can be found from LLVM's > + documentation: > + > + https://llvm.org/docs/LinkTimeOptimization.html > + This help document is misleading. People who read the document would misunderstand how great this feature would. This should be added in the commit log and Kconfig help: In contrast to the example in the documentation, Clang LTO for the kernel cannot remove any unreachable function or data. In fact, this results in even bigger vmlinux and modules. -- Best Regards Masahiro Yamada