Re: [PATCH] mm,oom: Use timeout based back off.

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 24 Oct 2018 15:54:54 -0700

On Mon, 22 Oct 2018 14:11:10 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> > Michal has been refusing timeout based approach, but I don't think this
> > is something we have to be frayed around the edge about possibility of
> > overlooking races/bugs just because Michal does not want to use timeout.
> > I believe that timeout based back off is the only approach we can use
> > for now.
> > 
> 
> I've proposed patches that have been running for months in a production 
> environment that make the oom killer useful without serially killing many 
> processes unnecessarily.  At this point, it is *much* easier to just fork 
> the oom killer logic rather than continue to invest time into fixing it in 
> Linux.  That's unfortunate because I'm sure you realize how problematic 
> the current implementation is, how abusive it is, and have seen its 
> effects yourself.  I admire your persistance in trying to fix the issues 
> surrounding the oom killer, but have come to the conclusion that forking 
> it is a much better use of time.

The oom killer is, I think, fairly standalone and it shouldn't be too
hard to add the infrastructure to make the whole thing pluggable.  At
runtime, not at build time.

But it is a last resort - it will result in fragmented effort and
difficult decisions for everyone regarding which should be used.

There has been a lot of heat and noise and confusion and handwaving in
all of this.  What we're crying out for is simple testcases which
everyone can run.  Find a problem, write the testcase, distribute that.
Develop a solution for that testcase then move on to the next one.