+ memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Tue, 14 Sep 2021 20:46:57 -0700

The patch titled
     Subject: memcg: prohibit unconditional exceeding the limit of dying tasks
has been added to the -mm tree.  Its filename is
     memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Vasily Averin <vvs@xxxxxxxxxxxxx>
Subject: memcg: prohibit unconditional exceeding the limit of dying tasks

The kernel currently allows dying tasks to exceed the memcg limits.  The
allocation is expected to be the last one and the occupied memory will be
freed soon.

This is not always true because it can be part of the huge vmalloc
allocation.  Allowed once, they will repeat over and over again.  Moreover
lifetime of the allocated object can differ from the lifetime of the dying
task.

Multiple such allocations running concurrently can not only overuse the
memcg limit, but can lead to a global out of memory and, in the worst
case, cause the host to panic.

This patch removes checks for exceeding of the memcg limit for dying
tasks.  Also it breaks endless loop for tasks bypassed by the oom killer. 
In addition, it renames should_force_charge() helper to task_is_dying()
because now its use do not lead to the forced charge.

Link: https://lkml.kernel.org/r/817a6ce2-4da9-72ac-c5b9-edd398d28a15@xxxxxxxxxxxxx
Signed-off-by: Vasily Averin <vvs@xxxxxxxxxxxxx>
Suggested-by: Michal Hocko <mhocko@xxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/memcontrol.c |   27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

--- a/mm/memcontrol.c~memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks
+++ a/mm/memcontrol.c
@@ -242,7 +242,7 @@ enum res_type {
 	     iter != NULL;				\
 	     iter = mem_cgroup_iter(NULL, iter, NULL))
 
-static inline bool should_force_charge(void)
+static inline bool task_is_dying(void)
 {
 	return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
 		(current->flags & PF_EXITING);
@@ -1580,7 +1580,7 @@ static bool mem_cgroup_out_of_memory(str
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
-	ret = should_force_charge() || out_of_memory(&oc);
+	ret = task_is_dying() || out_of_memory(&oc);
 
 unlock:
 	mutex_unlock(&oom_lock);
@@ -2535,6 +2535,7 @@ static int try_charge_memcg(struct mem_c
 	struct page_counter *counter;
 	enum oom_status oom_status;
 	unsigned long nr_reclaimed;
+	bool passed_oom = false;
 	bool may_swap = true;
 	bool drained = false;
 	unsigned long pflags;
@@ -2570,15 +2571,6 @@ retry:
 		goto force;
 
 	/*
-	 * Unlike in global OOM situations, memcg is not in a physical
-	 * memory shortage.  Allow dying and OOM-killed tasks to
-	 * bypass the last charges so that they can exit quickly and
-	 * free their memory.
-	 */
-	if (unlikely(should_force_charge()))
-		goto force;
-
-	/*
 	 * Prevent unbounded recursion when reclaim operations need to
 	 * allocate memory. This might exceed the limits temporarily,
 	 * but we prefer facilitating memory reclaim and getting back
@@ -2635,8 +2627,9 @@ retry:
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	if (fatal_signal_pending(current))
-		goto force;
+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
 
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
@@ -2645,14 +2638,10 @@ retry:
 	 */
 	oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
 		       get_order(nr_pages * PAGE_SIZE));
-	switch (oom_status) {
-	case OOM_SUCCESS:
+	if (oom_status == OOM_SUCCESS) {
+		passed_oom = true;
 		nr_retries = MAX_RECLAIM_RETRIES;
 		goto retry;
-	case OOM_FAILED:
-		goto force;
-	default:
-		goto nomem;
 	}
 nomem:
 	if (!(gfp_mask & __GFP_NOFAIL))
_

Patches currently in -mm which might be from vvs@xxxxxxxxxxxxx are

memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch