Re: [PATCH] mm, oom: don't invoke oom killer if current has been reapered

Yafang Shao <laoar.shao@xxxxxxxxx> · Mon, 13 Jul 2020 20:24:07 +0800

On Mon, Jul 13, 2020 at 2:21 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Mon 13-07-20 08:01:57, Michal Hocko wrote:
> > On Fri 10-07-20 23:18:01, Yafang Shao wrote:
> [...]
> > > There're many threads of a multi-threaded task parallel running in a
> > > container on many cpus. Then many threads triggered OOM at the same time,
> > >
> > > CPU-1               CPU-2         ...        CPU-n
> > > thread-1        thread-2      ...        thread-n
> > >
> > > wait oom_lock   wait oom_lock ...        hold oom_lock
> > >
> > >                                          (sigkill received)
> > >
> > >                                          select current as victim
> > >                                          and wakeup oom reaper
> > >
> > >                                          release oom_lock
> > >
> > >                                          (MMF_OOM_SKIP set by oom reaper)
> > >
> > >                                          (lots of pages are freed)
> > > hold oom_lock
> >
> > Could you be more specific please? The page allocator never waits for
> > the oom_lock and keeps retrying instead. Also __alloc_pages_may_oom
> > tries to allocate with the lock held.
>
> I suspect that you are looking at memcg oom killer.

Right, these threads were waiting the oom_lock in mem_cgroup_out_of_memory().

> Because we do not do
> trylock there for some reason I do not immediatelly remember from top of
> my head. If this is really the case then I would recommend looking into
> how the page allocator implements this and follow the same pattern for
> memcg as well.
>

That is a good suggestion.
But we can't try locking the global oom_lock here, because task ooming
in memcg foo may can't help the tasks in memcg bar.
IOW, we need to introduce the per memcg oom_lock, like bellow,

mem_cgroup_out_of_memory
+    if (mutex_trylock(memcg->lock))
+        return true.

    if (mutex_lock_killable(&oom_lock))
        return true;

And the memcg tree should also be considered.

-- 
Thanks
Yafang