Re: Enable EarlyOOM on Fedora KDE - Fedora 33 Self-Contained Change proposal

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Tue, 30 Jun 2020 17:21:14 -0600

On Tue, Jun 30, 2020 at 4:42 PM Neal Gompa <ngompa13@xxxxxxxxx> wrote:
>
> On Tue, Jun 30, 2020 at 6:30 PM Kevin Kofler <kevin.kofler@xxxxxxxxx> wrote:
> >
> > I am opposed to this change, for the same reasons I was opposed to EarlyOOM
> > to begin with: this solution kills processes under arbitrary conditions that
> > are neither necessary nor sufficient to prevent a swap thrashing collapse
> > (i.e., it can both kill processes unnecessarily (false positives) and fail
> > to kill processes when it would be necessary to prevent swap thrashing
> > (false negatives)). It is also unclear that it can prevent full OOM (both
> > RAM and swap completely full) in all cases, due to EarlyOOM being a
> > userspace process that races with the memory-consuming processes and that
> > may end up not getting scheduled due to the very impending OOM condition
> > that it is trying to prevent.
> >
> > I strongly believe that this kernel problem can only be solved within the
> > kernel, by a synchronous (hence race-free) hook on all allocations.
> >
>
> I still believe that too, except *nobody cares in the Linux kernel
> community*. This problem has existed for a decade, and nobody in the
> Linux kernel community is interested in fixing it. At this point, I've
> given up. Most of us have given up. If they ever get around to doing
> the right thing, I'll happily punt all this stuff, but they won't, so
> here we are.

To put a fine nuance on this, the best oversimplification I've come up
with that squares with mm kernel developers in this area is: kernel's
built-in oomkiller cares about the kernel's survival. It's not
concerned about user space directly, insofar as it's being reasonably
fair (i.e. approximately equal misery for all processes even though
one in particular does deserve to be killed off the most). Once the
kernel's ability to manage the system itself comes under pressure?
That's when oomkiller will knock that shit off.

What the mm and cgroups2 folks have done is provide some knobs to make
it possible to (a) control processes consumption of resources with a
very granular level and (b) provide the necessary isolation for a user
space oom killer to do things more intelligently. That's oomd today,
and might be systemd-oomd down the road in a form that's simpler and
doesn't require any tuning - it can in effect come just from proper
resource control being applied.

So... like. You're both correct as weird as that might seem at first
and second gances. But yeah, earlyoom is known and intended to be a
bit simplistic and it's not always going to do what you want but
really we're trying to knock off the worst instances of these
problems. Because frankly the resource control picture isn't yet
complete, so not all this cgroup2 stuff is done on the desktop. And
also it's one of the top 3 reasons for btrfs by default because it's
the only file system right now the cgroup2 guys tell us they know does
IO isolation properly. It is not a hard requirement to use btrfs to
get improved resource control, but since memory and io pressures are
tightly coupled, to have a complete picture we need IO isolation. That
doesn't mean every time you're lost without btrfs. But some percent of
the time (whatever it is) the workload will be such that IO isolation
is needed and if we don't have it, the control won't be quite as good.

And yeah, we'll have to test it and try to break it because that's fun!

So I think it's a good idea to use earlyoom, and it's simple enough
for folks to tweak, if they want to tweak, and learn more about oom
stuff. And maybe they decide they want to know even more, and end up
looking at nohang, which is a more sophisticated user space oom
killer. And they can learn a ton more about how this is actually a
hard problem and people are learning all kinds of new things about
solving it. And then if they get really down into the rabbit hole
maybe they want to do some more cgroup2 and resource control work with
their own apps, or even contribute at the desktop or kernel level. Why
not?

-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx