Hi, On Wed, 2021-07-28 at 13:33 +1000, David Airlie wrote: > > From your description, something obviously went wrong: either > > assignment of cgroups has failed and everything is in the same big > > group, or sd-oomd made a bad shot. systemd-cgls should show which it is. > > Thanks for the hint, systemd-cgls at least makes it appear as if > everything > is in different slice > > ─user.slice > │ └─user-1000.slice > │ ├─user@1000.service > │ │ ├─session.slice > > │ │ ├─app.slice > > │ │ │ ├─app-org.gnome.Terminal.slice > │ │ │ │ ├─vte-spawn-f4a41678-fa09-4ab7-b6d4-d89e18bdb5f4.scope > > I do find it strange it picks > Killed > /user.slice/user-1000.slice/user@1000.service/session.slice/org.gnome.Shell@wayland.service > > I might have to dig into systemd-oomd to see why it picks totally the > wrong option here. It is mostly going by the rate of "pgscan" in the memory statistics. So, if you are in a heavy swap situation and the kernel tries to evict pages from gnome-shell, then it is perfectly viable for it to become the victim. The main guard we have against that currently is the assigned uresourced memory protection (which is only a very partial guard). i.e. on Fedora Workstation, you should have "uresourced" running, which will assign a memory allocation of 250MiB to the current user and their session.slice. $ cat /sys/fs/cgroup/user.slice/memory.min 262144000 $ cat /sys/fs/cgroup/user.slice/user-1000.slice/memory.min 262144000 $ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.min 262144000 $ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/session.slice/memory.min 262144000 Basically, right now we Hope (TM), that this gives gnome-shell enough space to keep its working set in memory, so that it continues running smoothly and systemd-oomd does not consider it as a candidate. Note that you can configure the size of this memory allocation by editing /etc/uresourced.conf[1]. For now, we don't have another solution. A better solution requires reworking the systemd-oomd victim selection algorithm so that users can steer it away from important processes (i.e. exclude session.slice). I have some ideas for that, but that is pending on having some heuristic whether the killing a cgroup is going to improve the situation. Benjamin [1] To apply, you'll need to run systemctl restart uresourced.service and systemctl --user daemon-reload
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure