On Sun, Jan 05, 2020 at 12:29:40PM +0100, Aleksandra Fedorova wrote: > On Sun, Jan 5, 2020 at 10:18 AM Zbigniew Jędrzejewski-Szmek > <zbyszek@xxxxxxxxx> wrote: > > > > On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote: > > > On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova <alpha@xxxxxxxxxxxx> wrote: > > > > > > > Since in the Change we are not introducing just the earlyoom tool but enable it with a specific profile I would add those details here. Smth like: > > > > > > > > "earlyoom service will choose the offending process based on the same oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM left, and SIGKILL on 5%" > > > > > > I add this information to the summary. Also, I think these numbers may > > > need to change to avoid prematurely sending SIGTERM when the system > > > has no swap device. > > > > > > > As I understand in the current setup we are looking more for a controlled failure scenario rather than for a solution. > > > > > > Yes, it's fair to say this proposal is to make things "less bad". It > > > doesn't improve system responsiveness. Once heavy swap starts, the > > > system is sluggish, stutters, and briefly stalls. This proposal > > > doesn't fix that. There is a lot of room for improvement. > > > > > > > > > > Can we get a specific manual, what users supposed to do, once they trigger the earlyoom? Does earlyoom help in reporting? Which logs we need to look at? > > > > > > > > Maybe add a section in UX part of the change, or setup a dedicated wiki page? > > > > > > The user shouldn't need to do anything differently than if the kernel > > > oom-killer had triggered. The system journal will contain messages > > > showing what was killed and why: > > > > > > Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below > > > SIGTERM limits: mem 10 %, swap 10 % > > > Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process > > > 27421 "chrome": badness 305, VmRSS 42 MiB > > > > > > > > > > Additionally, there was a question during the chat discussion: how the earlyoom setup will work together with OOMPolicy and any other related options of systemd units? Will systemd recognize the OOM event? > > > > > > My understanding of systemd OOMPolicy= behavior, is it looks for the > > > kernel's oom-killer messages and acts upon those. Whereas earlyoom > > > uses the same metric (oom_score) as the oom-killer, it does not invoke > > > the oom-killer. Therefore systemd probably does not get the proper > > > hint to implement OOMPolicy= > > > > Yes. The kernel reports oom events in the cgroup file memory.events, > > and systemd waits for an inotify event on that file; OOMPolicy=stop is > > implemented that way. And the OOMPolicy=kill option is "implemented" > > by setting memory.oom.group=1 in the kernel [1] and having the kernel > > kill all the processes. So systemd is providing a thin wrapper around > > the kernel functionality. > > > > If processes are not killed by the kernel but through a signal from > > userspace, all of this will not work. > > I grepped /usr/lib/systemd and /etc/systemd for "OOM" on my > workstation and it seems that we have only OOMScoreAdjust option used > in the installed systemd units. And this option will be respected by > earlyoom. > > Since on workstation we don't use tweaking of the OOMPolicy on the > unit level, I'd say we can leave the tweaking to the system > administrators: when there is need to adjust OOMPolicy of a service, > administrators would need to tweak or disable earlyoom service as > well. Having "conflicts" between things, in the sense that using one feature means that another feature needs to be disabled, is always an option. But it's never a very good option. I think that it isn't too important to keep OOMPolicy= working, since its a new and relatively unused thing. Nevertheless, it would be nice to find a way to avoid this and support both features at the same time. This thread 'til now is mostly about establishing whether there really is a conflict (it seems yes) and whether there is some easy way to avoid it (not sure yet...). I think we should explore that before settling on the easy but suboptimal answer. > But I'd like to understand better the difference between _default_ > OOM-event and _default_ earlyoom-event: > > Afaik DefaultOOMPolicy is set to "stop", which means if one of the > processes in the service is killed by OOM, other processes from the > same service are gracefully stopped by systemd. > > What is the default behavior of the systemd service on external > SIGTERM/SIGKILL signal sent to the process by earlyoom? It depends on which of the processes is killed. If the main process is killed with SIGTERM, systemd kill consider this a normal successful termination. If the main process is killed with SIGKILL, systemd will consider this a failure. (Both of those cases modified by SuccessExitStatus=.) If some random subprocess is killed, systemd will not care at all. So in general, just killing a subprocess with SIGTERM results at least in systemd reporting successful termination when it shouldn't. Zbyszek _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx