On Mon, Dec 21, 2020 at 11:29 AM Ben Cotton <bcotton@xxxxxxxxxx> wrote: > > https://fedoraproject.org/wiki/Changes/EnableSystemdOomd > > == Summary == > > Provide a better experience for Fedora users in out-of-memory (OOM) > situations by enabling > [https://www.freedesktop.org/software/systemd/man/systemd-oomd.html > systemd-oomd] by default. Actions taken by systemd-oomd operate on a > per-cgroup level, aligning well with the life cycle of systemd units. > systemd-oomd primarily uses [https://facebookmicrosites.github.io/psi/ > Linux pressure stall information (PSI)] to make decisions based on > wasted productivity due to resource shortages; in addition to that, it > also supports swap based actions. > > == Owners == > > * Name: [[User:anitazha|Anita Zhang]], [[User:Dcavalca|Davide > Cavalca]], [[User:Salimma|Michel Salim]], [[User:Htejun|Tejun Heo]], > [[User:3ki|Rik van Riel]] > * Email: the.anitazha@xxxxxxxxx, dcavalca@xxxxxx, > michel@xxxxxxxxxxxxxxx, htejun@xxxxxx, riel@xxxxxx > > > == Detailed description == > > The primary mechanism used by systemd-oomd for detecting when the > system is out of memory is memory pressure. Memory pressure measures > the percentage of time a cgroup has “wasted” due to lack of memory. > This includes time spent reclaiming free memory, faulting in recently > resident pages, and loading in anonymous pages from swap. When a > monitored cgroup’s memory pressure exceeds the specified thresholds, > systemd-oomd will perform action(s) on the targeted cgroup’s > descendants, starting from the cgroups with the most reclaim scans. > Reclaim activity is used here, rather than the largest consumer, as it > reflects values set in the cgroup memory controller for memory > protection (such as memory.low). > > For memory pressure configuration, this will be > ManagedOOMMemoryPressure=kill and ManagedOOMMemoryPressureLimit=4% on > user@.service to have systemd-oomd send SIGKILLs to all processes > under a selected cgroup when total memory pressure on all tasks > exceeds 4% for 10 seconds. > > For swap based actions, systemd-oomd will monitor the system-wide swap > space and act when available swap falls below the configured > threshold, starting with the cgroups with the highest swap usage to > the least. Keeping some amount of swap (if enabled) available will > prevent the kernel OOM killer from killing processes unpredictably and > spending an unbounded amount of time afterwards. > > For swap configuration, this will be SwapUsedLimitPercent=90% in > oomd.conf and ManagedOOMSwap=kill on -.slice (root cgroup slice) to > have systemd-oomd send SIGKILLs to all processes under a cgroup when > swap used exceeds 90%. > > > == Benefit to Fedora == > > * Addressing the issue of improving user feedback in > https://pagure.io/fedora-workstation/issue/202, systemd-oomd currently > logs to the journal if pressure or swap action is about to occur. > There are also debug logs, for each process that is sent a SIGKILL, > that can be bumped up in priority. Further notification mechanisms > (i.e. over dbus) can also be implemented depending on feedback. > * While systemd-oomd is simpler in configuration to the oomd used at > Facebook, the algorithm is largely the same. As such, the following > case study can be used as an example of how PSI and cgroup killing can > release memory not normally resolved with process killing and lead to > better utilization: > https://facebookincubator.github.io/oomd/docs/oomd-casestudy.html > * OOM killing in userspace, before the kernel OOM killer kicks in, has > been shown to be effective at keeping a system functional. An OOM kill > in the kernel is slow, possibly leading to an unbounded amount of time > swapping in and out pages and evicting the page cache. > * PSI based actions, versus looking at raw memory consumption numbers, > better reflect memory protection policies set for cgroup resource > control limits (e.g. memory.low). > > == Scope == > > * Proposal owners: > ** Implement and land additional refinements to systemd-oomd > *** Remove swap as a hard requirement to running systemd-oomd > *** Expand ManagedOOM*= properties to user units (currently only > usable on system units) > *** Configurable memory pressure time window knob > ** Enable oomd by default with sensible configuration > ** Test days > ** Aid with documentation > * Other developers: > ** systemd: review PRs as needed > * Release engineering: https://pagure.io/releng/issue/9913 > * Policies and guidelines: N/A > * Trademark approval: N/A > > == Upgrade/compatibility impact == > > Existing systems running earlyoom will not be modified. One can > transition to systemd-oomd via: > > <pre>sudo systemctl disable --now earlyoom > sudo systemctl enable --now systemd-oomd</pre> > Systems that were previously not running earlyoom will have > systemd-oomd enabled by default. > > == How to test == > > systemd 247 build for Fedora includes all the artifacts for > systemd-oomd. It is disabled by default but can be started with: > > <pre>sudo systemctl enable --now systemd-oomd</pre> > At this point you can decide which units to set properties on. For > example, to enable swap-based killing on all units below the root > slice: > > <pre>sudo systemctl edit --force -- -.slice > [Slice] > ManagedOOMSwap=kill > # save and exit</pre> > > Note that the following memory pressure example requires the changes > listed in “Scope” to work as expected, as systemd-oomd shipped with > systemd v247 does not support changing the time window for memory > pressure. This example was run on a system with swap: > > <pre>systemctl edit user@.service > [Service] > ManagedOOMMemoryPressure=kill > ManagedOOMMemoryPressureLimit=4% > # save and exit > > systemd-run --user tail /dev/zero # will lead to a lot of reclaim and > then OOM if not killed</pre> > > == User experience == > > This should be a fully transparent change for users. > > == Dependencies == > > None. If changes to oomd are required to address feedback to this > proposal, they will need to be merged in systemd. > > == Contingency plan == > > * Contingency mechanism: For workstation, owner will revert all > changes and we’ll go back to using earlyoom instead > * Contingency deadline: Final freeze > * Blocks release? No > * Blocks product? No > > == Documentation == > > https://www.freedesktop.org/software/systemd/man/systemd-oomd.html<br /> > https://www.freedesktop.org/software/systemd/man/oomctl.html<br /> > https://www.freedesktop.org/software/systemd/man/oomd.conf.html > > == Release Notes == > > systemd-oomd is enabled by default. Depending on which systemd units > have ManagedOOMSwap=kill or ManagedOOMMemoryPressure=kill, > systemd-oomd will SIGKILL all the processes under the appropriate > descendant cgroups when the configured limits are exceeded. > > To revert back to earlyoom, run: > > <pre>sudo systemctl disable --now systemd-oomd > sudo systemctl enable --now earlyoom</pre> > See man oomd.conf for configuration options. > This is pretty awesome! :) Would it also be possible to somehow have it support being a backend for GIO's GMemoryMonitor API[1]? [1]: https://developer.gnome.org/gio/2.64/GMemoryMonitor.html -- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx