Re: [PATCH] mm,oom: Use timeout based back off.

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 23 Oct 2018 07:56:55 +0200

On Mon 22-10-18 14:11:10, David Rientjes wrote:
[...]
> I've proposed patches that have been running for months in a production 
> environment that make the oom killer useful without serially killing many 
> processes unnecessarily.  At this point, it is *much* easier to just fork 
> the oom killer logic rather than continue to invest time into fixing it in 
> Linux.  That's unfortunate because I'm sure you realize how problematic 
> the current implementation is, how abusive it is, and have seen its 
> effects yourself.  I admire your persistance in trying to fix the issues 
> surrounding the oom killer, but have come to the conclusion that forking 
> it is a much better use of time.

These are some pretty strong words for a code that tends to work for
most users out there. I do not remember any bug reports except for
artificial stress tests or your quite unspecific claims about absolutely
catastrophic impact which is not backed by any specific details.

I have shown interest in addressing as many issues as possible but I
absolutely detest getting back to the previous state with an
indeterministic pile of heuristic which were lockup prone and basically
unmaintainable.

Going around with timeouts and potentially export them to userspace
might sound attractive for the simplicity but this should be absolutely
the last resort when a proper solution is too complex (from a code or
maintainability POV). I do not believe we have reached that state yet.
-- 
Michal Hocko
SUSE Labs