On Sun, Jun 11, 2023 at 08:14:25PM -0700, Linus Torvalds wrote: > On Sun, Jun 11, 2023 at 7:22 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > I guess the regression fix needs a regression fix.... > > Yup. > > From the description of the problem, it sounds like this happens on > real hardware, no vhost anywhere? > > Or maybe Darrick (who doesn't see the issue) is running on raw > hardware, and you and Zorro are running in a virtual environment? I'm testing inside VMs and seeing it, I can't speak for anyone else. .... > So *maybe* this attached patch might fix it? I haven't thought very > deeply about this, but vhost workers most definitely shouldn't call > do_coredump(), since they are then not counted. > > (And again, I think we should just check that PF_IO_WORKER bit, not > use this more complex test, but that's a separate and bigger change). > > Linus > kernel/signal.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 2547fa73bde5..a1e11ee8537c 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2847,6 +2847,10 @@ bool get_signal(struct ksignal *ksig) > */ > current->flags |= PF_SIGNALED; > > + /* vhost workers don't participate in core dups */ > + if ((current->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER) > + goto out; > + > if (sig_kernel_coredump(signr)) { > if (print_fatal_signals) > print_fatal_signal(ksig->info.si_signo); That would appear to make things worse. mkfs.xfs hung in Z state on exit and never returned to the shell. Also, multiple processes are livelocked like this: Sending NMI from CPU 0 to CPUs 1-3: NMI backtrace for cpu 2 CPU: 2 PID: 3409 Comm: pmlogger_farm Not tainted 6.4.0-rc5-dgc+ #1822 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:uprobe_deny_signal+0x5/0x90 Code: 48 c7 c1 c4 64 62 82 48 c7 c7 d1 64 62 82 e8 b2 39 ec ff e9 70 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <55> 31 4 RSP: 0018:ffffc900023abdf0 EFLAGS: 00000202 RAX: 0000000000000004 RBX: ffff888103b127c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000296 RDI: ffffc900023abe70 RBP: ffffc900023abe60 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: ffff88813bd2ccf0 R12: ffff888103b127c0 R13: ffffc900023abe70 R14: ffff888110413700 R15: ffff888103d26e80 FS: 00007f35497a4740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00007ffd4ca0ce80 CR3: 000000010f7d1000 CR4: 00000000000006e0 Call Trace: <NMI> ? show_regs+0x61/0x70 ? nmi_cpu_backtrace+0x88/0xf0 ? nmi_cpu_backtrace_handler+0x11/0x20 ? nmi_handle+0x57/0x150 ? default_do_nmi+0x49/0x240 ? exc_nmi+0xf4/0x110 ? end_repeat_nmi+0x16/0x31 ? uprobe_deny_signal+0x5/0x90 ? uprobe_deny_signal+0x5/0x90 ? uprobe_deny_signal+0x5/0x90 </NMI> <TASK> ? get_signal+0x94/0x9b0 ? signal_setup_done+0x66/0x190 arch_do_signal_or_restart+0x2f/0x260 exit_to_user_mode_prepare+0x181/0x1c0 syscall_exit_to_user_mode+0x16/0x40 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0023:0xffff888103b127c0 Code: Unable to access opcode bytes at 0xffff888103b12796. RSP: 002b:00007ffd4ca0d0ac EFLAGS: 00000202 ORIG_RAX: 000000000000003d RAX: 0000000000000009 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00007ffd4d20bb9c RDI: 00000000ffffffff RBP: 00007ffd4d20bb9c R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffd4d20bba0 R14: 00005604571fc380 R15: 0000000000000001 </TASK> NMI backtrace for cpu 3 CPU: 3 PID: 3526 Comm: pmlogger_check Not tainted 6.4.0-rc5-dgc+ #1822 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:fixup_exception+0x72/0x260 Code: 14 0f 87 03 02 00 00 ff 24 d5 98 67 22 82 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 41 81 cd 00 00 00 40 4d 63 ed 4d 89 6c 24 50 <31> c0 9 RSP: 0018:ffffc9000275bb58 EFLAGS: 00000083 RAX: 000000000000000f RBX: ffffffff827d0a4c RCX: ffffffff810c5f95 RDX: 000000000000000f RSI: ffffffff827d0a4c RDI: ffffc9000275bb28 RBP: ffffc9000275bb80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000275bc78 R13: 000000000000000e R14: 000000008f5ded3f R15: 0000000000000000 FS: 00007f56a36de740(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000008f5ded3f CR3: 000000010dcde000 CR4: 00000000000006e0 Call Trace: <NMI> ? show_regs+0x61/0x70 ? nmi_cpu_backtrace+0x88/0xf0 ? nmi_cpu_backtrace_handler+0x11/0x20 ? nmi_handle+0x57/0x150 ? default_do_nmi+0x49/0x240 ? exc_nmi+0xf4/0x110 ? end_repeat_nmi+0x16/0x31 ? copy_fpstate_to_sigframe+0x1c5/0x3a0 ? fixup_exception+0x72/0x260 ? fixup_exception+0x72/0x260 ? fixup_exception+0x72/0x260 </NMI> <TASK> kernelmode_fixup_or_oops+0x49/0x120 __bad_area_nosemaphore+0x15a/0x230 ? __bad_area+0x57/0x80 bad_area_nosemaphore+0x16/0x20 exc_page_fault+0x323/0x880 asm_exc_page_fault+0x27/0x30 RIP: 0010:copy_fpstate_to_sigframe+0x1c5/0x3a0 Code: 45 89 bc 24 40 25 00 00 f0 41 80 64 24 01 bf e9 f5 fe ff ff be 3c 00 00 00 48 c7 c7 77 9c 5f 82 e8 00 2a 23 00 31 c0 0f 1f 00 <49> 0f 1 RSP: 0018:ffffc9000275bd28 EFLAGS: 00010246 RAX: 000000000000000e RBX: 000000008f5de7ec RCX: ffffc9000275bda8 RDX: 000000008f5ded40 RSI: 000000000000003c RDI: ffffffff825f9c77 RBP: ffffc9000275bd98 R08: ffffc9000275be30 R09: 0000000000000001 R10: 0000000000000000 R11: ffffc90000138ff8 R12: ffff8881106527c0 R13: 000000008f5deb40 R14: ffff888110654d40 R15: ffff88810a653f40 ? copy_fpstate_to_sigframe+0x1c0/0x3a0 ? __might_sleep+0x42/0x70 get_sigframe+0xcd/0x2b0 ia32_setup_frame+0x61/0x230 arch_do_signal_or_restart+0x1d1/0x260 exit_to_user_mode_prepare+0x181/0x1c0 irqentry_exit_to_user_mode+0x9/0x30 irqentry_exit+0x33/0x40 exc_page_fault+0x1b6/0x880 asm_exc_page_fault+0x27/0x30 RIP: 0023:0x106527c0 Code: Unable to access opcode bytes at 0x10652796. RSP: 002b:000000008f5ded6c EFLAGS: 00010202 RAX: 000000000000000b RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00007ffd8f5df2ec RDI: 00000000ffffffff RBP: 00007ffd8f5df2ec R08: 0000000000000000 R09: 00005558962eb526 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffd8f5df2f0 R14: 00005558962b5e60 R15: 0000000000000001 </TASK> Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx