On 12/13/24 07:00, Rik van Riel wrote: > On Thu, 12 Dec 2024 18:31:57 +0000 > Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote: > >> Is it about a single task or groups of tasks or the entire cgroup? >> If former, why it's a problem? A tight memcg limit can slow things down >> in general and I don't see why we should treat the exit() path differently. >> > I think the exit path does need to be treated a little differently, > since this exit may be the only way such a cgroup can free up memory. > >> If it's about the entire cgroup and we have essentially a deadlock, >> I feel like we need to look into the oom reaper side. > > You mean something like the below? > > I have not tested it yet, because we don't have any stuck > cgroups right now among the workloads that I'm monitoring. > > ---8<--- > > From c0e545fd45bd3ee24cd79b3d3e9b375e968ef460 Mon Sep 17 00:00:00 2001 > From: Rik van Riel <riel@xxxxxxxxxxx> > Date: Thu, 12 Dec 2024 14:50:49 -0500 > Subject: [PATCH] memcg,oom: speed up reclaim for exiting tasks > > When a memcg reaches its memory limit, and reclaim becomes unavailable > or slow for some reason, for example only zswap is available, but zswap > writeback is disabled, it can take a long time for tasks to exit, and > for the cgroup to get back to normal (or cleaned up). > > Speed up memcg reclaim for exiting tasks by limiting how much work > reclaim does, and by invoking the OOM reaper if reclaim does not > free up enough memory to allow the task to make progress. > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> > --- > include/linux/oom.h | 8 ++++++++ > mm/memcontrol.c | 11 +++++++++++ > mm/oom_kill.c | 6 +----- > 3 files changed, 20 insertions(+), 5 deletions(-) > > diff --git a/include/linux/oom.h b/include/linux/oom.h > index 1e0fc6931ce9..b2d9cf936664 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -111,4 +111,12 @@ extern void oom_killer_enable(void); > > extern struct task_struct *find_lock_task_mm(struct task_struct *p); > > +#ifdef CONFIG_MMU > +extern void queue_oom_reaper(struct task_struct *tsk); > +#else > +static intern void queue_oom_reaper(struct task_struct *tsk) > +{ > +} > +#endif > + > #endif /* _INCLUDE_LINUX_OOM_H */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7b3503d12aaf..21f42758d430 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2231,6 +2231,9 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > if (!gfpflags_allow_blocking(gfp_mask)) > goto nomem; > > + if (unlikely(current->flags & PF_EXITING)) > + gfp_mask |= __GFP_NORETRY; > + if (task_is_dying()) gfp_mask |= __GFP_NORETRY > memcg_memory_event(mem_over_limit, MEMCG_MAX); > raised_max_event = true; > > @@ -2284,6 +2287,14 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > goto retry; > } > nomem: > + /* > + * We ran out of memory while inside exit. Maybe the OOM > + * reaper can help reduce cgroup memory use and get things > + * moving again? > + */ > + if (unlikely(current->flags & PF_EXITING)) > + queue_oom_reaper(current); > + I am not sure this is helpful without task_will_free_mem(), the dying task shouldn't get sent to the OOM killer when we run out of memory. I did notice that we have heuristics around task_is_dying() and passed_oom, sounds like the end result of your changes would be to ignore the heuristics of passed_oom Balbir Singh.