On Mon, 22 Oct 2018 14:11:10 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote: > > Michal has been refusing timeout based approach, but I don't think this > > is something we have to be frayed around the edge about possibility of > > overlooking races/bugs just because Michal does not want to use timeout. > > I believe that timeout based back off is the only approach we can use > > for now. > > > > I've proposed patches that have been running for months in a production > environment that make the oom killer useful without serially killing many > processes unnecessarily. At this point, it is *much* easier to just fork > the oom killer logic rather than continue to invest time into fixing it in > Linux. That's unfortunate because I'm sure you realize how problematic > the current implementation is, how abusive it is, and have seen its > effects yourself. I admire your persistance in trying to fix the issues > surrounding the oom killer, but have come to the conclusion that forking > it is a much better use of time. The oom killer is, I think, fairly standalone and it shouldn't be too hard to add the infrastructure to make the whole thing pluggable. At runtime, not at build time. But it is a last resort - it will result in fragmented effort and difficult decisions for everyone regarding which should be used. There has been a lot of heat and noise and confusion and handwaving in all of this. What we're crying out for is simple testcases which everyone can run. Find a problem, write the testcase, distribute that. Develop a solution for that testcase then move on to the next one.