Re: earlyoom by default

Dusty Mabe <dusty@xxxxxxxxxxxxx> · Mon, 6 Jan 2020 21:55:58 -0500

On 1/6/20 1:18 PM, Chris Murphy wrote:
> Hi server@ and cloud@ folks,
> 
> There is a system-wide change to enable earlyoom by default on Fedora
> Workstation. It came up in today's Workstation working group meeting
> that I should give you folks a heads up about opting into this change.

Thanks for the heads up!

> 
> Proposal
> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
> Devel@ discussion
> https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/message/YXDODS3G4YCS7MT4J2QJMJ7EXCVR7NQ2/
> 
> The main issue on a workstation, heavy swap leading to an unresponsive
> system, is perhaps not as immediately frustrating on a server.  But
> the consequences of indefinite hang or the kernel oom-killer
> triggering, which is a SIGKILL, are perhaps worse.
> 
> On the plus side, earlyoom is easy to understand, and its first
> attempt is a SIGTERM rather than SIGKILL. It uses oom_score, same as
> kernel oom-killer, to determine the victim.
> 
> The SIGTERM is issued to the process with the highest oom_score only
> if both memory and swap reach 10% free. And SIGKILL is issued to the
> process with the highest oom_score once memory and swap reach 5% free.
> Those percentages can be tweaked, but the KILL percentage is always
> 1/2 of the TERM  percentage, so it's a bit rudimentary.

Yeah. Adding more ways to relate SIGTERM to SIGKILL (other the 1/2) would 
be nice. 

> 
> One small concern I have is, what if there's no swap? That's probably
> uncommon for servers, but I'm not sure about cloud. But in this case,

For cloud at least it's very common to not have swap. I'd argue for servers
you don't want them swapping either but resources aren't quite as elastic as
in the cloud so you might not be able to burst resources like you can in the cloud.

> SIGTERM happens at 10% of RAM, which leaves a lot of memory on the
> table, and for a server with significant resources it's probably too
> high. What about 4%? Maybe still too high? One option I'm thinking of
> is a systemd conditional that would not run earlyoom on systems
> without a swap device, which would leave these systems no worse off
> than they are right now. [i.e. they eventually recover (?),
> indefinitely hang (likely), or oom-killer finally kills something
> (less likely).]

Seems like it on these systems it would nice to make earlyoom SIGTERM just
right before SIGKILL. i.e. try the nice way and then bring in the hammer.
In this case a 1% difference in threshold would be useful. i.e. SIGTERM at
5% SIGKILL at 4% or something like that.

> 
> I've been testing earlyoom, nohang, and the kernel oom-killer for > 6
> months now, and I think it would be completely sane for Server and
> Cloud products to enable earlyoom by default for fc32, while
> evaluating other solutions that can be more server oriented (e.g.
> nohang, oomd, possibly others) for fc33/fc34. What is clear: this
> isn't going to be solved by kernel folks, the kernel oom-killer only
> cares about keeping the kernel alive, it doesn't care about user space
> at all.
> 
> In the cases where this becomes a problem, either the kernel hangs
> indefinitely or does SIGKILL for your database or whatever is eating
> up resources. Whereas at least earlyoom's first attempt is a SIGTERM
> so it has a chance of gracefully quitting.
> 
> There are some concerns, those are in the devel@ thread, and I expect
> they'll be adequately addressed or the feature will not pass the FESCo
> vote. But as a short term solution while evaluating more sophisticated
> solutions, I think this is a good call so I thought I'd just mention
> it, in case you folks want to be included in the change.
> 
> 

Thanks!
_______________________________________________
cloud mailing list -- cloud@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to cloud-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/cloud@xxxxxxxxxxxxxxxxxxxxxxx