[PATCH v2] mm, memcg: avoid oom if cgroup is not populated

Yafang Shao <laoar.shao@xxxxxxxxx> · Tue, 26 Nov 2019 20:28:37 -0500

There's one case that the processes in a memcg are all exit (due to OOM
group or some other reasons), but the file page caches are still exist.
These file page caches may be protected by memory.min so can't be
reclaimed. If we can't success to restart the processes in this memcg or
don't want to make this memcg offline, then we want to drop the file page
caches.
The advantage of droping this file caches is it can avoid the reclaimer
(either kswapd or direct) scanning and reclaiming pages from all memcgs
exist in this system, because currently the reclaimer will fairly reclaim
pages from all memcgs if the system is under memory pressure.
The possible method to drop these file page caches is setting the
hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
and generates lots of outputs, that should not happen.
The OOM output is not expected by the admin if he or she wants to drop
the cahes and knows there're no processes in this memcg.

If memcg is not populated, we should not invoke the OOM killer because
there's nothing to kill. Next time when you start a new process and if the
max is still bellow usage, the OOM killer will be invoked and your new
process is killed, so we can cosider it as lazy OOM, that is we have been
always doing in the kernel.

Fixes: b6e6edcf ("mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage")
Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
---
 mm/memcontrol.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1c4c08b..e936f1b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
 			continue;
 		}
 
-		memcg_memory_event(memcg, MEMCG_OOM);
-		if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
+		/* If there's no procesess, we don't need to invoke the OOM
+		 * killer. Then next time when you try to start a process
+		 * in this memcg, the max may still bellow usage, and then
+		 * this OOM killer will be invoked. This can be considered
+		 * as lazy OOM, that is we have been always doing in the
+		 * kernel. Pls. Michal, that is really consistency.
+		 */
+		if (cgroup_is_populated(memcg->css.cgroup)) {
+			memcg_memory_event(memcg, MEMCG_OOM);
+			if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
+				break;
+		} else  {
 			break;
+		}
 	}
 
 	memcg_wb_domain_size_changed(memcg);
-- 
1.8.3.1