On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote: > The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can > be targeted by the oom reaper. This mapping is used to store the futex > robust list head; the kernel does not keep a copy of the robust list and > instead references a userspace address to maintain the robustness during > a process death. A race can occur between exit_mm and the oom reaper that > allows the oom reaper to free the memory of the futex robust list before > the exit path has handled the futex death: > > CPU1 CPU2 > ------------------------------------------------------------------------ > page_fault > do_exit "signal" > wake_oom_reaper > oom_reaper > oom_reap_task_mm (invalidates mm) > exit_mm > exit_mm_release > futex_exit_release > futex_cleanup > exit_robust_list > get_user (EFAULT- can't access memory) > > If the get_user EFAULT's, the kernel will be unable to recover the > waiters on the robust_list, leaving userspace mutexes hung indefinitely. > > Use the robust_list address stored in the kernel to skip the VMA that holds > it, allowing a successful futex_cleanup. > > Theoretically a failure can still occur if there are locks mapped as > PRIVATE|ANON; however, the robust futexes are a best-effort approach. > This patch only strengthens that best-effort. > > The following case can still fail: > robust head (skipped) -> private lock (reaped) -> shared lock (skipped) This is still all sorts of confused.. it's a list head, the entries can be in any random other VMA. You must not remove *any* user memory before doing the robust thing. Not removing the VMA that contains the head is pointless in the extreme. Did you not read the previous discussion?