earlyoom by default

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Mon, 6 Jan 2020 11:18:53 -0700

Hi server@ and cloud@ folks,

There is a system-wide change to enable earlyoom by default on Fedora
Workstation. It came up in today's Workstation working group meeting
that I should give you folks a heads up about opting into this change.

Proposal
https://fedoraproject.org/wiki/Changes/EnableEarlyoom
Devel@ discussion
https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/message/YXDODS3G4YCS7MT4J2QJMJ7EXCVR7NQ2/

The main issue on a workstation, heavy swap leading to an unresponsive
system, is perhaps not as immediately frustrating on a server.  But
the consequences of indefinite hang or the kernel oom-killer
triggering, which is a SIGKILL, are perhaps worse.

On the plus side, earlyoom is easy to understand, and its first
attempt is a SIGTERM rather than SIGKILL. It uses oom_score, same as
kernel oom-killer, to determine the victim.

The SIGTERM is issued to the process with the highest oom_score only
if both memory and swap reach 10% free. And SIGKILL is issued to the
process with the highest oom_score once memory and swap reach 5% free.
Those percentages can be tweaked, but the KILL percentage is always
1/2 of the TERM  percentage, so it's a bit rudimentary.

One small concern I have is, what if there's no swap? That's probably
uncommon for servers, but I'm not sure about cloud. But in this case,
SIGTERM happens at 10% of RAM, which leaves a lot of memory on the
table, and for a server with significant resources it's probably too
high. What about 4%? Maybe still too high? One option I'm thinking of
is a systemd conditional that would not run earlyoom on systems
without a swap device, which would leave these systems no worse off
than they are right now. [i.e. they eventually recover (?),
indefinitely hang (likely), or oom-killer finally kills something
(less likely).]

I've been testing earlyoom, nohang, and the kernel oom-killer for > 6
months now, and I think it would be completely sane for Server and
Cloud products to enable earlyoom by default for fc32, while
evaluating other solutions that can be more server oriented (e.g.
nohang, oomd, possibly others) for fc33/fc34. What is clear: this
isn't going to be solved by kernel folks, the kernel oom-killer only
cares about keeping the kernel alive, it doesn't care about user space
at all.

In the cases where this becomes a problem, either the kernel hangs
indefinitely or does SIGKILL for your database or whatever is eating
up resources. Whereas at least earlyoom's first attempt is a SIGTERM
so it has a chance of gracefully quitting.

There are some concerns, those are in the devel@ thread, and I expect
they'll be adequately addressed or the feature will not pass the FESCo
vote. But as a short term solution while evaluating more sophisticated
solutions, I think this is a good call so I thought I'd just mention
it, in case you folks want to be included in the change.

-- 
Chris Murphy
_______________________________________________
cloud mailing list -- cloud@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to cloud-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/cloud@xxxxxxxxxxxxxxxxxxxxxxx