On Tue, 5 Nov 2019, Thomas Gleixner wrote: > > I'm a moron. It's vfork() not fork() so the behaviour is expected. > > Staring more at the trace which shows me where this goes down the drain. parent child set FIFO prio 2 vfork() -> set FIFO prio 1 implies wait_for_child() sched_setscheduler(...) exit() do_exit() tsk->flags |= PF_EXITING; .... mm_release() exit_futex(); (NOOP in this case) complete() --> wakes parent sys_futex() loop infinite because PF_EXITING is set, but PF_EXITPIDONE not So the obvious question is why PF_EXITPIDONE is set way after the futex exit cleanup has run, but moving this right after exit_futex() would not solve the exit race completely because the code after setting PF_EXITING is preemptible. So the same crap could happen just by preemption: task holds futex ... do_exit() tsk->flags |= PF_EXITING; preemption (unrelated wakeup of some other higher prio task, e.g. timer) switch_to(other_task) return to user sys_futex() loop infinite as above And just for the fun of it the futex exit cleanup could trigger the wakeup itself before PF_EXITPIDONE is set. There is some other issue which I need to lookup again. That's a slightly different problem but related to futex exit race conditions. The way we can deal with that is: do_exit() tsk->flags |= PF_EXITING; ... mutex_lock(&tsk->futex_exit_mutex); futex_exit(); tsk->flags |= PF_EXITPIDONE; mutex_unlock(&tsk->futex_exit_mutex); and on the futex lock_pi side: if (!(tsk->flags & PF_EXITING)) return 0; <- All good if (tsk->flags & PF_EXITPIDONE) return -EOWNERDEAD; <- Locker can take over mutex_lock(&tsk->futex_exit_mutex); if (tsk->flags & PF_EXITPIDONE) { mutex_unlock(&tsk->futex_exit_mutex); return -EOWNERDEAD; <- Locker can take over } queue_futex(); mutex_unlock(&tsk->futex_exit_mutex); Not that I think it's pretty, but it plugs all holes AFAICT. Thanks, tglx