On Tue, 2020-11-03 at 17:31 -0600, Gratian Crisan wrote: > Hi all, > > I apologize for waking up the futex demons (and replying to my own > email), but ... > > Gratian Crisan writes: > > > > Brandon and I have been debugging a nasty race that leads to > > BUG_ON(!newowner) in fixup_pi_state_owner() in futex.c. So far > > we've only been able to reproduce the issue on 4.9.y-rt kernels. > > We are still testing if this is a problem for later RT branches. > > I was able to reproduce the BUG_ON(!newowner) in fixup_pi_state_owner() > with a 5.10.0-rc1-rt1 kernel (currently testing 5.10.0-rc2-rt4). My box says it's generic. KERNEL: vmlinux-5.10.0.gb7cbaf5-master.gz DUMPFILE: vmcore CPUS: 8 DATE: Wed Nov 4 01:46:56 2020 UPTIME: 00:02:06 LOAD AVERAGE: 0.25, 0.15, 0.06 TASKS: 726 NODENAME: homer RELEASE: 5.10.0.gb7cbaf5-master VERSION: #26 SMP Tue Nov 3 14:10:35 CET 2020 MACHINE: x86_64 (3591 Mhz) MEMORY: 16 GB PANIC: "" PID: 4631 COMMAND: "f_waiter" TASK: ffff88818a1fb900 [THREAD_INFO: ffff88818a1fb900] CPU: 1 STATE: TASK_RUNNING (PANIC) crash.rt> bt PID: 4631 TASK: ffff88818a1fb900 CPU: 1 COMMAND: "f_waiter" #0 [ffff88816a0b3a58] machine_kexec at ffffffff8104b2dc #1 [ffff88816a0b3aa0] __crash_kexec at ffffffff810fc97a #2 [ffff88816a0b3b60] crash_kexec at ffffffff810fda55 #3 [ffff88816a0b3b70] oops_end at ffffffff81021813 #4 [ffff88816a0b3b90] do_trap at ffffffff8101eaec #5 [ffff88816a0b3be0] do_error_trap at ffffffff8101ebd5 #6 [ffff88816a0b3c20] exc_invalid_op at ffffffff816d8bdb #7 [ffff88816a0b3c40] asm_exc_invalid_op at ffffffff81800a62 #8 [ffff88816a0b3cc8] fixup_pi_state_owner at ffffffff810f065c #9 [ffff88816a0b3d58] futex_wait_requeue_pi.constprop.0 at ffffffff810f1fcb #10 [ffff88816a0b3ec8] do_futex at ffffffff810f2482 #11 [ffff88816a0b3ed8] __x64_sys_futex at ffffffff810f2ab5 #12 [ffff88816a0b3f40] do_syscall_64 at ffffffff816d88c3 #13 [ffff88816a0b3f50] entry_SYSCALL_64_after_hwframe at ffffffff8180007c RIP: 00007f665b94f839 RSP: 00007f665b056e88 RFLAGS: 00000212 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f665b94f839 RDX: 0000000000000509 RSI: 000000000000008b RDI: 00000000006020c0 RBP: 00007f665b056ed0 R8: 00000000006020c4 R9: 0000000000000000 R10: 00007f665b056ef0 R11: 0000000000000212 R12: 00007ffd42284c3e R13: 00007ffd42284c3f R14: 0000000000000000 R15: 00007ffd42284c40 ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b crash.rt> gdb list *0xffffffff810f065c 0xffffffff810f065c is in fixup_pi_state_owner (kernel/futex.c:2386). 2381 2382 /* 2383 * Since we just failed the trylock; there must be an owner. 2384 */ 2385 newowner = rt_mutex_owner(&pi_state->pi_mutex); 2386 BUG_ON(!newowner); 2387 } else { 2388 WARN_ON_ONCE(argowner != current); 2389 if (oldowner == current) { 2390 /* crash.rt>