On 4/21/22 10:40, Thomas Gleixner wrote: > On Thu, Apr 14 2022 at 10:40, Nico Pache wrote: >> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can >> be targeted by the oom reaper. This mapping is used to store the futex >> robust list head; the kernel does not keep a copy of the robust list and >> instead references a userspace address to maintain the robustness during >> a process death. A race can occur between exit_mm and the oom reaper that >> allows the oom reaper to free the memory of the futex robust list before >> the exit path has handled the futex death: >> >> CPU1 CPU2 >> ------------------------------------------------------------------------ >> page_fault >> do_exit "signal" >> wake_oom_reaper >> oom_reaper >> oom_reap_task_mm (invalidates mm) >> exit_mm >> exit_mm_release >> futex_exit_release >> futex_cleanup >> exit_robust_list >> get_user (EFAULT- can't access memory) >> >> If the get_user EFAULT's, the kernel will be unable to recover the >> waiters on the robust_list, leaving userspace mutexes hung indefinitely. >> >> Delay the OOM reaper, allowing more time for the exit path to perform >> the futex cleanup. >> >> Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer >> >> [1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370 > > A link to the original discussion about this would be more useful than a > code reference which is stale tomorrow. The above explanation is good > enough to describe the problem. Hi Andrew, can you please update the link when you add the ACKs. Here is a more stable link: [1] https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 > >> >> +/* >> + * Give the OOM victim time to exit naturally before invoking the oom_reaping. >> + * The timers timeout is arbitrary... the longer it is, the longer the worst >> + * case scenario for the OOM can take. If it is too small, the oom_reaper can >> + * get in the way and release resources needed by the process exit path. >> + * e.g. The futex robust list can sit in Anon|Private memory that gets reaped >> + * before the exit path is able to wake the futex waiters. >> + */ >> +#define OOM_REAPER_DELAY (2*HZ) >> +static void queue_oom_reaper(struct task_struct *tsk) > > Bah. Did you run out of newlines? Glueing that define between the > comment and the function is unreadable. My Enter key hit its cgroup limit for newlines. Andrew, would it be possible to also add a new line when you make the other changes. Sorry about that. > > Other than that. > > Acked-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Thanks!