On Tue, Sep 29, 2020 at 2:46 PM Sami Tolvanen <samitolvanen@xxxxxxxxxx> wrote: > > This patch series adds support for building x86_64 and arm64 kernels > with Clang's Link Time Optimization (LTO). > > In addition to performance, the primary motivation for LTO is > to allow Clang's Control-Flow Integrity (CFI) to be used in the > kernel. Google has shipped millions of Pixel devices running three > major kernel versions with LTO+CFI since 2018. > > Most of the patches are build system changes for handling LLVM > bitcode, which Clang produces with LTO instead of ELF object files, > postponing ELF processing until a later stage, and ensuring initcall > ordering. Sami, thanks for continuing to drive the series. I encourage you to keep resending with fixes accumulated or dropped on a weekly cadence. The series worked well for me on arm64, but for x86_64 on mainline I saw a stream of new objtool warnings: testing your LTO series; x86_64 defconfig + CONFIG_THINLTO: ``` LTO vmlinux.o OBJTOOL vmlinux.o vmlinux.o: warning: objtool: wakeup_long64()+0x61: indirect jump found in RETPOLINE build vmlinux.o: warning: objtool: .text+0x308a: indirect jump found in RETPOLINE build vmlinux.o: warning: objtool: .text+0x30c5: indirect jump found in RETPOLINE build vmlinux.o: warning: objtool: copy_user_enhanced_fast_string() falls through to next function copy_user_generic_unrolled() vmlinux.o: warning: objtool: __memcpy_mcsafe() falls through to next function mcsafe_handle_tail() vmlinux.o: warning: objtool: memset() falls through to next function memset_erms() vmlinux.o: warning: objtool: __memcpy() falls through to next function memcpy_erms() vmlinux.o: warning: objtool: __x86_indirect_thunk_rax() falls through to next function __x86_retpoline_rax() vmlinux.o: warning: objtool: __x86_indirect_thunk_rbx() falls through to next function __x86_retpoline_rbx() vmlinux.o: warning: objtool: __x86_indirect_thunk_rcx() falls through to next function __x86_retpoline_rcx() vmlinux.o: warning: objtool: __x86_indirect_thunk_rdx() falls through to next function __x86_retpoline_rdx() vmlinux.o: warning: objtool: __x86_indirect_thunk_rsi() falls through to next function __x86_retpoline_rsi() vmlinux.o: warning: objtool: __x86_indirect_thunk_rdi() falls through to next function __x86_retpoline_rdi() vmlinux.o: warning: objtool: __x86_indirect_thunk_rbp() falls through to next function __x86_retpoline_rbp() vmlinux.o: warning: objtool: __x86_indirect_thunk_r8() falls through to next function __x86_retpoline_r8() vmlinux.o: warning: objtool: __x86_indirect_thunk_r9() falls through to next function __x86_retpoline_r9() vmlinux.o: warning: objtool: __x86_indirect_thunk_r10() falls through to next function __x86_retpoline_r10() vmlinux.o: warning: objtool: __x86_indirect_thunk_r11() falls through to next function __x86_retpoline_r11() vmlinux.o: warning: objtool: __x86_indirect_thunk_r12() falls through to next function __x86_retpoline_r12() vmlinux.o: warning: objtool: __x86_indirect_thunk_r13() falls through to next function __x86_retpoline_r13() vmlinux.o: warning: objtool: __x86_indirect_thunk_r14() falls through to next function __x86_retpoline_r14() vmlinux.o: warning: objtool: __x86_indirect_thunk_r15() falls through to next function __x86_retpoline_r15() ``` I think those should be resolved before I provide any kind of tested by tag. My other piece of feedback was that I like the default ThinLTO, but I think the help text in the Kconfig which is visible during menuconfig could be improved by informing the user the tradeoffs. For example, if CONFIG_THINLTO is disabled, it should be noted that full LTO will be used instead. Also, that full LTO may produce slightly better optimized binaries than ThinLTO, at the cost of not utilizing multiple cores when linking and thus significantly slower to link. Maybe explaining that setting it to "n" implies a full LTO build, which will be much slower to link but possibly slightly faster would be good? It's not visible unless LTO_CLANG and ARCH_SUPPORTS_THINLTO is enabled, so I don't think you need to explain that THINLTO without those is *not* full LTO. I'll leave the precise wording to you. WDYT? Also, when I look at your treewide DISABLE_LTO patch, I think "does that need to be a part of this series, or is it a cleanup that can stand on its own?" I think it may be the latter? Maybe it would help shed one more patch than to have to carry it to just send it? Or did I miss something as to why it should remain a part of this series? -- Thanks, ~Nick Desaulniers