Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

Zbigniew Jędrzejewski-Szmek <zbyszek@xxxxxxxxx> · Sun, 5 Jan 2020 12:31:17 +0000

On Sun, Jan 05, 2020 at 12:29:40PM +0100, Aleksandra Fedorova wrote:
> On Sun, Jan 5, 2020 at 10:18 AM Zbigniew Jędrzejewski-Szmek
> <zbyszek@xxxxxxxxx> wrote:
> >
> > On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote:
> > > On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova <alpha@xxxxxxxxxxxx> wrote:
> > >
> > > > Since in the Change we are not introducing just the earlyoom tool but enable it with a specific profile I would add those details here. Smth like:
> > > >
> > > > "earlyoom service will choose the offending process based on the same oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM left, and SIGKILL on 5%"
> > >
> > > I add this information to the summary. Also, I think these numbers may
> > > need to change to avoid prematurely sending SIGTERM when the system
> > > has no swap device.
> > >
> > > > As I understand in the current setup we are looking more for a controlled failure scenario rather than for a solution.
> > >
> > > Yes, it's fair to say this proposal is to make things "less bad". It
> > > doesn't improve system responsiveness. Once heavy swap starts, the
> > > system is sluggish, stutters, and briefly stalls. This proposal
> > > doesn't fix that. There is a lot of room for improvement.
> > >
> > >
> > > > Can we get a specific manual, what users supposed to do, once they trigger the earlyoom? Does earlyoom help in reporting? Which logs we need to look at?
> > > >
> > > > Maybe add a section in UX part of the change, or setup a dedicated wiki page?
> > >
> > > The user shouldn't need to do anything differently than if the kernel
> > > oom-killer had triggered. The system journal will contain messages
> > > showing what was killed and why:
> > >
> > > Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
> > > SIGTERM limits: mem 10 %, swap 10 %
> > > Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
> > > 27421 "chrome": badness 305, VmRSS 42 MiB
> > >
> > >
> > > > Additionally, there was a question during the chat discussion: how the earlyoom setup will work together with OOMPolicy and any other related options of systemd units? Will systemd recognize the OOM event?
> > >
> > > My understanding of systemd OOMPolicy= behavior, is it looks for the
> > > kernel's oom-killer messages and acts upon those. Whereas earlyoom
> > > uses the same metric (oom_score) as the oom-killer, it does not invoke
> > > the oom-killer. Therefore systemd probably does not get the proper
> > > hint to implement OOMPolicy=
> >
> > Yes. The kernel reports oom events in the cgroup file memory.events,
> > and systemd waits for an inotify event on that file; OOMPolicy=stop is
> > implemented that way. And the OOMPolicy=kill option is "implemented"
> > by setting memory.oom.group=1 in the kernel [1] and having the kernel
> > kill all the processes. So systemd is providing a thin wrapper around
> > the kernel functionality.
> >
> > If processes are not killed by the kernel but through a signal from
> > userspace, all of this will not work.
> 
> I grepped /usr/lib/systemd and /etc/systemd for "OOM" on my
> workstation and it seems that we have only OOMScoreAdjust option used
> in the installed systemd units. And this option will be respected by
> earlyoom.
> 
> Since on workstation we don't use tweaking of the OOMPolicy on the
> unit level, I'd say we can leave the tweaking to the system
> administrators: when there is need to adjust OOMPolicy of a service,
> administrators would need to tweak or disable earlyoom service as
> well.

Having "conflicts" between things, in the sense that using one feature
means that another feature needs to be disabled, is always an option.
But it's never a very good option. I think that it isn't too important
to keep OOMPolicy= working, since its a new and relatively unused thing.
Nevertheless, it would be nice to find a way to avoid this and
support both features at the same time. This thread 'til now is mostly
about establishing whether there really is a conflict (it seems yes)
and whether there is some easy way to avoid it (not sure yet...). I
think we should explore that before settling on the easy but suboptimal
answer.

> But I'd like to understand better the difference between _default_
> OOM-event and _default_ earlyoom-event:
> 
> Afaik DefaultOOMPolicy is set to "stop", which means if one of the
> processes in the service is killed by OOM, other processes from the
> same service are gracefully stopped by systemd.
> 
> What is the default behavior of the systemd service on external
> SIGTERM/SIGKILL signal sent to the process by earlyoom?

It depends on which of the processes is killed. If the main process
is killed with SIGTERM, systemd kill consider this a normal successful termination.
If the main process is killed with SIGKILL, systemd will consider this a failure.
(Both of those cases modified by SuccessExitStatus=.)
If some random subprocess is killed, systemd will not care at all.
So in general, just killing a subprocess with SIGTERM results at
least in systemd reporting successful termination when it shouldn't.

Zbyszek
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx