On Wed, Oct 18, 2023 at 08:54:33AM -0700, Josh Poimboeuf wrote: > On Wed, Oct 18, 2023 at 05:12:45PM +0200, Borislav Petkov wrote: > > On Wed, Oct 18, 2023 at 03:38:56PM +0200, Ingo Molnar wrote: > > > If then WARN_ONCE(). > > > > WARN_ONCE() is not enough considering that if this fires, it means we're > > not really properly protected against one of those RET-speculation > > things. > > > > It needs to be warning constantly but then still allow booting. I.e, > > a ratelimited warn of sorts but I don't think we have that... yet. > > I'm not sure a rate-limited WARN() would be a good thing. Either the > user is regularly checking dmesg (most likely in some automated fashion) > or they're not. If the latter, a rate-limited WARN() would wrap dmesg > pretty quickly. Well, freezing the box without any mention about why it happens is not viable either. So for lack of a better solution, overflowing dmesg is all we could do. And, on a related note, I'm thinking I should revert: e92626af3234 ("x86/retpoline: Remove .text..__x86.return_thunk section") after all because I'm debugging another similar issue reported by dhowells. And I can reproduce it on linux-next with his config and gcc-13. The splat looks like this below - and mind you, that's in a VM. On baremetal you get to see only the first warning and output stops. And that happens because for whatever reason apply_returns() can't find that last jmp __x86_return_thunk for %r15 and it barfs. When I revert e92626af3234, it is fixed. It fixes dhowells' box too. Which means, IMHO, objtool is missing to add a return return call site at the end of that __x86_indirect_thunk_r15. And considering how close we are to the merge window, I'd let that .text..__x86.return_thunk section exist so that objtool can find the return sites more reliably that what we currently have. We can always do e92626af3234 later, when it has seen more testing. Now, to the UD2 case - look below at "* first splat". Stack protector fires but there's no #UD exception. Well, there is, it is well hidden: (gdb) bt #0 delay_tsc (cycles=3670543) at arch/x86/lib/delay.c:90 #1 0xffffffff810c706e in panic (fmt=fmt@entry=0xffffffff82504fe4 "stack-protector: Kernel stack is corrupted in: %pB") at kernel/panic.c:456 #2 0xffffffff81d64afb in __stack_chk_fail () at kernel/panic.c:763 #3 0xffffffff810e9333 in notify_die (val=val@entry=DIE_TRAP, str=str@entry=0xffffffff824f49b8 "invalid opcode", regs=regs@entry=0xffff8880794000a8, err=err@entry=0, trap=trap@entry=6, sig=sig@entry=4) at kernel/notifier.c:597 #4 0xffffffff8101d4fb in do_error_trap (regs=regs@entry=0xffff8880794000a8, error_code=error_code@entry=0, str=str@entry=0xffffffff824f49b8 "invalid opcode", trapnr=trapnr@entry=6, signr=signr@entry=4, sicode=sicode@entry=2, addr=0xffffffff81d712a0 <__x86_return_thunk>) at arch/x86/kernel/traps.c:170 #5 0xffffffff81d62355 in handle_invalid_op (regs=0xffff8880794000a8) at ./arch/x86/include/asm/ptrace.h:209a ^^^^^^^^^^^^^^^^ #6 exc_invalid_op (regs=0xffff8880794000a8) at arch/x86/kernel/traps.c:263 #7 0xffffffff81e00a96 in asm_exc_invalid_op () at ./arch/x86/include/asm/idtentry.h:568 #8 0xffffffff81d49ff4 in cmp_ex_sort (a=0x38020f, b=0x761d61d8b) at lib/extable.c:61 #9 0x000000000000000c in fixed_percpu_data () #10 0xffff888079400198 in ?? () #11 0xffffffff826cdc70 in __modver_attr () #12 0x0000000000000270 in ?? () #13 0xffffffff826ceb10 in __start___ex_table () #14 0x0000000000000000 in ?? () (gdb) *while* it runs the #UD exception handler, stackprotector determines that the stack has been corrupted, leading to that panic. And nothing in dmesg tells the user what's really going on. And with the warn, you can actually see it: ------------[ cut here ]------------ Unconverted return thunk WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2855 check_thunks+0x11/0x1a Modules linked in: ... ---- I still need to figure out, though, how to make check_thunks *not* have a "jmp __x86_return_thunk" at the end itself because it gets loopy. :) * first splat ------------- ... x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. ------------[ cut here ]------------ missing return thunk: __x86_indirect_thunk_r15+0xa/0x5f-0x0: eb 74 66 66 2e WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:755 apply_returns+0xca/0x247 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-rc6-next-20231018-build3 #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:apply_returns+0xca/0x247 Code: 80 3d bd e1 aa 01 00 75 b4 49 89 d8 48 89 ea 48 89 de c6 05 ab e1 aa 01 01 b9 05 00 00 00 48 c7 c7 45 65 4f 82 e8 0b 10 0a 00 <0f> 0b eb 8f f6 05 36 2e 1f 02 02 74 26 0f b6 54 24 52 48 89 de 48 RSP: 0000:ffffffff82803e30 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff81d7122a RCX: 0000000000000003 RDX: 0000000000000086 RSI: 00000000fff7ffff RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff82803969 R12: ffffffff831fed00 R13: 0000000000000005 R14: ffffffff831fed18 R15: 0000000000013af0 FS: 0000000000000000(0000) GS:ffff888079400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888003401000 CR3: 0000000002854000 CR4: 00000000001506f0 Call Trace: <TASK> ? __warn+0x8c/0xf6 ? report_bug+0xbf/0x11f ? apply_returns+0xca/0x247 ? handle_bug+0x3c/0x63 ? exc_invalid_op+0x13/0x5d ? asm_exc_invalid_op+0x16/0x20 ? __x86_indirect_thunk_r15+0xa/0x5f ? apply_returns+0xca/0x247 ? __x86_indirect_thunk_r15+0xa/0x5f ? __x86_indirect_thunk_r15+0x19/0x5f ? __x86_indirect_thunk_r15+0xc/0x5f alternative_instructions+0x35/0xe2 arch_cpu_finalize_init+0xba/0xdb start_kernel+0x4a1/0x524 x86_64_start_reservations+0x25/0x25 x86_64_start_kernel+0x73/0x73 secondary_startup_64_no_verify+0x166/0x16b </TASK> ---[ end trace 0000000000000000 ]--- Freeing SMP alternatives memory: 36K pid_max: default: 32768 minimum: 301 LSM: initializing lsm=capability,yama,selinux Yama: becoming mindful. SELinux: Initializing. Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) smpboot: CPU0: Intel Core Processor (Haswell, no TSX) (family: 0x6, model: 0x3c, stepping: 0x1) RCU Tasks Rude: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1. RCU Tasks Trace: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1. Performance Events: unsupported p6 CPU model 60 no PMU driver, software events only. signal: max sigframe size: 1776 rcu: Hierarchical SRCU implementation. rcu: Max phase no-delay instances is 1000. NMI watchdog: Perf NMI watchdog permanently disabled smp: Bringing up secondary CPUs ... BUG: spinlock bad magic on CPU#0, swapper/0/1 Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: notify_die+0x52/0x5b CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.6.0-rc6-next-20231018-build3 #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: notify_die+0x52/0x5b ]--- -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette