The patch titled Subject: mm, oom, docs: describe the cgroup-aware OOM killer has been removed from the -mm tree. Its filename was mm-oom-docs-describe-the-cgroup-aware-oom-killer.patch This patch was dropped because an alternative patch was merged ------------------------------------------------------ From: Roman Gushchin <guro@xxxxxx> Subject: mm, oom, docs: describe the cgroup-aware OOM killer Document the cgroup-aware OOM killer. [guro@xxxxxx: cgroup-aware OOM logic is disabled by default] Link: http://lkml.kernel.org/r/20171201170149.GB27436@xxxxxxxxxxxxxxxxxxxxxxxxxxx [mhocko@xxxxxxxx: clarify root memcg oom accounting] Link: http://lkml.kernel.org/r/20180130122011.GB21609@xxxxxxxxxxxxxx [akpm@xxxxxxxxxxxxxxxxxxxx: tweak text, fix typo] Link: http://lkml.kernel.org/r/20171130152824.1591-7-guro@xxxxxx Signed-off-by: Roman Gushchin <guro@xxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/admin-guide/cgroup-v2.rst | 74 ++++++++++++++++++++++ 1 file changed, 74 insertions(+) --- a/Documentation/admin-guide/cgroup-v2.rst~mm-oom-docs-describe-the-cgroup-aware-oom-killer +++ a/Documentation/admin-guide/cgroup-v2.rst @@ -48,6 +48,7 @@ v1 is available under Documentation/cgro 5-2-1. Memory Interface Files 5-2-2. Usage Guidelines 5-2-3. Memory Ownership + 5-2-4. OOM Killer 5-3. IO 5-3-1. IO Interface Files 5-3-2. Writeback @@ -1069,6 +1070,31 @@ PAGE_SIZE multiple when read back. high limit is used and monitored properly, this limit's utility is limited to providing the final safety net. + memory.oom_group + + A read-write single value file which exists on non-root + cgroups. The default is "0". + + If set, OOM killer will consider the memory cgroup as an + indivisible memory consumers and compare it with other memory + consumers by it's memory footprint. + If such memory cgroup is selected as an OOM victim, all + processes belonging to it or it's descendants will be killed. + + This applies to system-wide OOM conditions and reaching + the hard memory limit of the cgroup and their ancestor. + If OOM condition happens in a descendant cgroup with it's own + memory limit, the memory cgroup can't be considered + as an OOM victim, and OOM killer will not kill all belonging + tasks. + + Also, OOM killer respects the /proc/pid/oom_score_adj value -1000, + and will never kill the unkillable task, even if memory.oom_group + is set. + + If cgroup-aware OOM killer is not enabled, ENOTSUPP error + is returned on attempt to access the file. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified @@ -1293,6 +1319,54 @@ to be accessed repeatedly by other cgrou POSIX_FADV_DONTNEED to relinquish the ownership of memory areas belonging to the affected files to ensure correct memory ownership. +OOM Killer +~~~~~~~~~~ + +Cgroup v2 memory controller implements a cgroup-aware OOM killer. +It means that it treats cgroups as first class OOM entities. + +Cgroup-aware OOM logic is turned off by default and requires +passing the "groupoom" option on mounting cgroupfs. It can also +by remounting cgroupfs with the following command:: + + # mount -o remount,groupoom $MOUNT_POINT + +Under OOM conditions the memory controller tries to make the best +choice of a victim, looking for a memory cgroup with the largest +memory footprint, considering leaf cgroups and cgroups with the +memory.oom_group option set, which are considered to be an indivisible +memory consumers. + +By default, OOM killer will kill the biggest task in the selected +memory cgroup. A user can change this behavior by enabling +the per-cgroup memory.oom_group option. If set, it causes +the OOM killer to kill all processes attached to the cgroup, +except processes with oom_score_adj set to -1000. + +This affects both system- and cgroup-wide OOMs. For a cgroup-wide OOM +the memory controller considers only cgroups belonging to the sub-tree +of the OOM'ing cgroup. + +Leaf cgroups and cgroups with oom_group option set are compared based +on their cumulative memory usage. The root cgroup is treated as a +leaf memory cgroup as well, so it is compared with other leaf memory +cgroups. Due to internal implementation restrictions the size of +the root cgroup is the cumulative sum of oom_badness of all its tasks +(in other words oom_score_adj of each task is obeyed). Relying on +oom_score_adj (apart from OOM_SCORE_ADJ_MIN) can lead to over- or +underestimation of the root cgroup consumption and it is therefore +discouraged. This might change in the future, however. + +If there are no cgroups with the enabled memory controller, +the OOM killer is using the "traditional" process-based approach. + +Please, note that memory charges are not migrating if tasks +are moved between different memory cgroups. Moving tasks with +significant memory footprint may affect OOM victim selection logic. +If it's a case, please, consider creating a common ancestor for +the source and destination memory cgroups and enabling oom_group +on ancestor layer. + IO -- _ Patches currently in -mm which might be from guro@xxxxxx are mm-introduce-mem_cgroup_put-helper.patch cgroup-list-groupoom-in-cgroup-features.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html