Re: cgroup-aware OOM killer, how to move forward

Tejun Heo <tj@xxxxxxxxxx> · Tue, 24 Jul 2018 07:35:04 -0700

Hello,

Lemme elaborate just a bit more.

On Tue, Jul 24, 2018 at 07:28:20AM -0700, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 24, 2018 at 04:25:54PM +0200, Michal Hocko wrote:
> > I am sorry but I do not follow. Besides that modeling the behavior on
> > panic_on_oom doesn't really sound very appealing to me. The knob is a
> > crude hack mostly motivated by debugging (at least its non-global
> > variants).
> 
> Hmm... we actually do use that quite a bit in production (moving away
> from it gradually).

So, the reason panic_on_oom is used is very similar for the reason one
would want group oom kill - workload integrity after an oom kill.
panic_on_oom is an expensive way of avoiding partial kills and the
resulting possibly inconsistent state.  Group oom can scope that down
so that we can maintain integrity per-application or domain rather
than at system level making it way cheaper.

> > So can we get back to workloads and shape the semantic on top of that
> > please?
> 
> I didn't realize we were that off track.  Don't both map to what we
> were discussing almost perfectly?

I guess the reason why panic_on_oom developed the two behaviors is
likely that the initial behavior - panicking on any oom - was too
inflexible.  We're scoping it down, so whatever problems we used to
have with panic_on_oom is less pronounced with group oom.  So, I don't
think this matters all that much in terms of practical usefulness.
Both always kliling and factoring in oom origin seem fine to me.
Let's just pick one.

Thanks.

-- 
tejun