Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Sun, 5 Jan 2020 11:09:18 -0700

On Sun, Jan 5, 2020 at 4:43 AM Aleksandra Fedorova <alpha@xxxxxxxxxxxx> wrote:
>
> I wonder, how I as a user going to be informed about the
> earlyoom-event?

Same as a kernel oom-killer event. Primary source is the journal.

But well before either earlyoom sends SIGTERM or kernel oom-killer
kills something, the user will know something is wrong, because system
responsivity will be stuttering or even already intermittently
hanging. Earlyoom is not aggressively clobbering things, except for
system configuration that have no swap device. That configuration
needs some earlyoom tweaking, probably, and we're looking at that, but
then those folks also aren't experiencing much reduced system
responsiveness in these cases because their system can't heavily swap.

>I assume abrt will recognize the crash? Will it be
> easily visible from the abrt report that it was the OOM?

No. It's not a crash. Earlyoom sends SIGTERM first, and only sends
SIGKILL if the process isn't responding in time to SIGTERM. And the
kernel oom-killer also doesn't result in an abrt report.

> The concern is: if we enable such a service, will we get large amount
> of vague bug reports from users who don't understand what has
> happened. Can we make it somehow easier to debug?

Unless further real world testing uncovers something very new and
different from my testing (entirely possible, but I can't estimate
that probability), there won't be a measurable increase in bug reports
related to this.

Based on my limited testing (I've done around 200+ tests of
oom-killer, earlyoom SIGTERM (never have seen a SIGKILL), and nohang;
and perhaps 80 of those tests involved forced power off during heavy
swap, compile and system use) there really isn't anything that
requires the user to get involved.

Also, there isn't a per se bug here. It's a series of intentional
designs that are colliding together in a deeply problematic user
experience for the desktop: that the "operating system", i.e. Fedora
Workstation providing kernel, systemd, a bunch of services, libraries,
policies - permits an unprivileged process to ask for essentially
unlimited resources and overcommit the system *and* then heavy swap
use results in compromised system responsiveness and control.

Earlyoom doesn't change any of that, it just selects a process for
SIGTERM much sooner than the kernel oom-killer. And that only stops
the bad experience, by stopping the resource hogging process. It isn't
actually fixing anything. It is in some sense an act of desperation,
that's been a long time coming. Arguably, earlyoom isn't aggressive
enough, doesn't stop the badness soon enough.

-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx