Enabling CONFIG_KCSAN causes a panic during boot due to an "invalid opcode" in __x86_return_thunk(): invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc2-00316-g91174087dcc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014 RIP: 0010:__x86_return_thunk+0x0/0x10 Code: e8 01 00 00 00 cc e8 01 00 00 00 cc 48 81 c4 80 00 00 00 65 48 c7 04 25 d0 ac 02 00 ff ff ff ff c3 cc 0f 1f 84 00 00 00 00 00 <0f> 0b cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 db 8c 8e fe 0f RSP: 0018:ffffaef1c0013ed0 EFLAGS: 00010246 RAX: ffffffffa0e80eb0 RBX: ffffffffa0f05240 RCX: 0001ffffffffffff RDX: 0000000000000551 RSI: ffffffffa0dcc64e RDI: ffffffffa0f05238 RBP: ffff8f93c11708e0 R08: ffffffffa1387280 R09: 0000000000000000 R10: 0000000000000282 R11: 0001ffffa0f05238 R12: 0000000000000002 R13: 0000000000000282 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8f93df000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8f93d6c01000 CR3: 0000000015c2e000 CR4: 0000000000350ef0 The panic is triggered by the UD2 instruction which gets patched into __x86_return_thunk() when alternatives are applied. After that point, the default return thunk should no longer be used. As David Kaplan describes, the issue is caused by a couple of KCSAN-generated constructors which aren't processed by objtool: "When KCSAN is enabled, GCC generates lots of constructor functions named _sub_I_00099_0 which call __tsan_init and then return. The returns in these are generally annotated normally by objtool and fixed up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not include a couple of object files that are in vmlinux, like init/version-timestamp.o and .vmlinux.export.o, both of which contain _sub_I_00099_0 functions. As a result, the returns in these functions are not annotated, and the panic occurs when we call one of them in do_ctors and it uses the default return thunk. This difference can be seen by counting the number of these functions in the object files: $ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:" 2601 $ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:" 2603 If these functions are only run during kernel boot, there is no speculation concern." Fix it by disabling KCSAN on version-timestamp.o and .vmlinux.export.o so the extra functions don't get generated. KASAN and GCOV are already disabled for those files. Fixes: 91174087dcc7 ("x86/retpoline: Ensure default return thunk isn't used at runtime") Reported-by: Nathan Chancellor <nathan@xxxxxxxxxx> Closes: https://lore.kernel.org/lkml/20231016214810.GA3942238@dev-arch.thelio-3990X/ Debugged-by: David Kaplan <David.Kaplan@xxxxxxx> Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx> Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Acked-by: Marco Elver <elver@xxxxxxxxxx> Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> --- init/Makefile | 1 + scripts/Makefile.vmlinux | 1 + 2 files changed, 2 insertions(+) diff --git a/init/Makefile b/init/Makefile index ec557ada3c12..cbac576c57d6 100644 --- a/init/Makefile +++ b/init/Makefile @@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE $(obj)/version-timestamp.o: include/generated/utsversion.h CFLAGS_version-timestamp.o := -include include/generated/utsversion.h KASAN_SANITIZE_version-timestamp.o := n +KCSAN_SANITIZE_version-timestamp.o := n GCOV_PROFILE_version-timestamp.o := n diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux index 3cd6ca15f390..c9f3e03124d7 100644 --- a/scripts/Makefile.vmlinux +++ b/scripts/Makefile.vmlinux @@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@ ifdef CONFIG_MODULES KASAN_SANITIZE_.vmlinux.export.o := n +KCSAN_SANITIZE_.vmlinux.export.o := n GCOV_PROFILE_.vmlinux.export.o := n targets += .vmlinux.export.o vmlinux: .vmlinux.export.o -- 2.41.0