Re: [patch 0/7] improve memcg oom killer robustness v2

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 12 Sep 2013 08:59:38 -0400

On Mon, Sep 09, 2013 at 02:56:59PM +0200, Michal Hocko wrote:
> [Adding Glauber - the full patch is here https://lkml.org/lkml/2013/9/5/319]
> 
> On Mon 09-09-13 14:36:25, Michal Hocko wrote:
> > On Thu 05-09-13 12:18:17, Johannes Weiner wrote:
> > [...]
> > > From: Johannes Weiner <hannes@xxxxxxxxxxx>
> > > Subject: [patch] mm: memcg: do not trap chargers with full callstack on OOM
> > > 
> > [...]
> > > 
> > > To fix this, never do any OOM handling directly in the charge context.
> > > When an OOM situation is detected, let the task remember the memcg and
> > > then handle the OOM (kill or wait) only after the page fault stack is
> > > unwound and about to return to userspace.
> > 
> > OK, this is indeed nicer because the oom setup is trivial and the
> > handling is not split into two parts and everything happens close to
> > out_of_memory where it is expected.
> 
> Hmm, wait a second. I have completely forgot about the kmem charging
> path during the review.
> 
> So while previously memcg_charge_kmem could have oom killed a
> task if the it couldn't charge to the u-limit after it managed
> to charge k-limit, now it would simply fail because there is no
> mem_cgroup_{enable,disable}_oom around __mem_cgroup_try_charge it relies
> on. The allocation will fail in the end but I am not sure whether the
> missing oom is an issue or not for existing use cases.

Kernel sites should be able to handle -ENOMEM, right?  And if this
nests inside a userspace fault, it'll still enter OOM.

> My original objection about oom triggered from kmem paths was that oom
> is not kmem aware so the oom decisions might be totally bogus. But we
> still have that:

Well, k should be a fraction of u+k on any reasonable setup, so there
are always appropriate candidates to take down.

>         /*
>          * Conditions under which we can wait for the oom_killer. Those are
>          * the same conditions tested by the core page allocator
>          */
>         may_oom = (gfp & __GFP_FS) && !(gfp & __GFP_NORETRY);
> 
>         _memcg = memcg;
>         ret = __mem_cgroup_try_charge(NULL, gfp, size >> PAGE_SHIFT,
>                                       &_memcg, may_oom);
> 
> I do not mind having may_oom = false unconditionally in that path but I
> would like to hear fromm Glauber first.

The patch I just sent to azur puts this conditional into try_charge(),
so I'd just change the kmem site to pass `true'.
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html