On Tue, Jun 30, 2020 at 4:42 PM Neal Gompa <ngompa13@xxxxxxxxx> wrote: > > On Tue, Jun 30, 2020 at 6:30 PM Kevin Kofler <kevin.kofler@xxxxxxxxx> wrote: > > > > I am opposed to this change, for the same reasons I was opposed to EarlyOOM > > to begin with: this solution kills processes under arbitrary conditions that > > are neither necessary nor sufficient to prevent a swap thrashing collapse > > (i.e., it can both kill processes unnecessarily (false positives) and fail > > to kill processes when it would be necessary to prevent swap thrashing > > (false negatives)). It is also unclear that it can prevent full OOM (both > > RAM and swap completely full) in all cases, due to EarlyOOM being a > > userspace process that races with the memory-consuming processes and that > > may end up not getting scheduled due to the very impending OOM condition > > that it is trying to prevent. > > > > I strongly believe that this kernel problem can only be solved within the > > kernel, by a synchronous (hence race-free) hook on all allocations. > > > > I still believe that too, except *nobody cares in the Linux kernel > community*. This problem has existed for a decade, and nobody in the > Linux kernel community is interested in fixing it. At this point, I've > given up. Most of us have given up. If they ever get around to doing > the right thing, I'll happily punt all this stuff, but they won't, so > here we are. To put a fine nuance on this, the best oversimplification I've come up with that squares with mm kernel developers in this area is: kernel's built-in oomkiller cares about the kernel's survival. It's not concerned about user space directly, insofar as it's being reasonably fair (i.e. approximately equal misery for all processes even though one in particular does deserve to be killed off the most). Once the kernel's ability to manage the system itself comes under pressure? That's when oomkiller will knock that shit off. What the mm and cgroups2 folks have done is provide some knobs to make it possible to (a) control processes consumption of resources with a very granular level and (b) provide the necessary isolation for a user space oom killer to do things more intelligently. That's oomd today, and might be systemd-oomd down the road in a form that's simpler and doesn't require any tuning - it can in effect come just from proper resource control being applied. So... like. You're both correct as weird as that might seem at first and second gances. But yeah, earlyoom is known and intended to be a bit simplistic and it's not always going to do what you want but really we're trying to knock off the worst instances of these problems. Because frankly the resource control picture isn't yet complete, so not all this cgroup2 stuff is done on the desktop. And also it's one of the top 3 reasons for btrfs by default because it's the only file system right now the cgroup2 guys tell us they know does IO isolation properly. It is not a hard requirement to use btrfs to get improved resource control, but since memory and io pressures are tightly coupled, to have a complete picture we need IO isolation. That doesn't mean every time you're lost without btrfs. But some percent of the time (whatever it is) the workload will be such that IO isolation is needed and if we don't have it, the control won't be quite as good. And yeah, we'll have to test it and try to break it because that's fun! So I think it's a good idea to use earlyoom, and it's simple enough for folks to tweak, if they want to tweak, and learn more about oom stuff. And maybe they decide they want to know even more, and end up looking at nohang, which is a more sophisticated user space oom killer. And they can learn a ton more about how this is actually a hard problem and people are learning all kinds of new things about solving it. And then if they get really down into the rabbit hole maybe they want to do some more cgroup2 and resource control work with their own apps, or even contribute at the desktop or kernel level. Why not? -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx