On Tue, Jan 16, 2024 at 01:45:47PM -0800, Andrew Morton wrote: > > The patch titled > Subject: mm: memcontrol: don't throttle dying tasks on memory.high > has been added to the -mm mm-hotfixes-unstable branch. Its filename is > mm-memcontrol-dont-throttle-dying-tasks-on-memoryhigh.patch > > This patch will shortly appear at > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-memcontrol-dont-throttle-dying-tasks-on-memoryhigh.patch Hi Andrew, there is an updated version from Johannes in the same thread. It seems like you've picked the original version. Please, pick the new one instead. Thank you! > > This patch will later appear in the mm-hotfixes-unstable branch at > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next via the mm-everything > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > and is updated there every 2-3 working days > > ------------------------------------------------------ > From: Johannes Weiner <hannes@xxxxxxxxxxx> > Subject: mm: memcontrol: don't throttle dying tasks on memory.high > Date: Thu, 11 Jan 2024 08:29:02 -0500 > > While investigating hosts with high cgroup memory pressures, Tejun > found culprit zombie tasks that had were holding on to a lot of > memory, had SIGKILL pending, but were stuck in memory.high reclaim. > > In the past, we used to always force-charge allocations from tasks > that were exiting in order to accelerate them dying and freeing up > their rss. This changed for memory.max in a4ebf1b6ca1e ("memcg: > prohibit unconditional exceeding the limit of dying tasks"); it noted > that this can cause (userspace inducable) containment failures, so it > added a mandatory reclaim and OOM kill cycle before forcing charges. > At the time, memory.high enforcement was handled in the userspace > return path, which isn't reached by dying tasks, and so memory.high > was still never enforced by dying tasks. > > When c9afe31ec443 ("memcg: synchronously enforce memory.high for large > overcharges") added synchronous reclaim for memory.high, it added > unconditional memory.high enforcement for dying tasks as well. The > callstack shows that this path is where the zombie is stuck in. > > We need to accelerate dying tasks getting past memory.high, but we > cannot do it quite the same way as we do for memory.max: memory.max is > enforced strictly, and tasks aren't allowed to move past it without > FIRST reclaiming and OOM killing if necessary. This ensures very small > levels of excess. With memory.high, though, enforcement happens lazily > after the charge, and OOM killing is never triggered. A lot of > concurrent threads could have pushed, or could actively be pushing, > the cgroup into excess. The dying task will enter reclaim on every > allocation attempt, with little hope of restoring balance. > > To fix this, skip synchronous memory.high enforcement on dying tasks > altogether again. Update memory.high path documentation while at it. > > Link: https://lkml.kernel.org/r/20240111132902.389862-1-hannes@xxxxxxxxxxx > Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for large overcharges") > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Reported-by: Tejun Heo <tj@xxxxxxxxxx> > Reviewed-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Acked-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> > Cc: Dan Schatzberg <schatzberg.dan@xxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Muchun Song <muchun.song@xxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/memcontrol.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > --- a/mm/memcontrol.c~mm-memcontrol-dont-throttle-dying-tasks-on-memoryhigh > +++ a/mm/memcontrol.c > @@ -2623,8 +2623,9 @@ static unsigned long calculate_high_dela > } > > /* > - * Scheduled by try_charge() to be executed from the userland return path > - * and reclaims memory over the high limit. > + * Reclaims memory over the high limit. Called directly from > + * try_charge() when possible, but also scheduled to be called from > + * the userland return path where reclaim is always able to block. > */ > void mem_cgroup_handle_over_high(gfp_t gfp_mask) > { > @@ -2693,6 +2694,9 @@ retry_reclaim: > } > > /* > + * Reclaim didn't manage to push usage below the limit, slow > + * this allocating task down. > + * > * If we exit early, we're guaranteed to die (since > * schedule_timeout_killable sets TASK_KILLABLE). This means we don't > * need to account for any ill-begotten jiffies to pay them off later. > @@ -2887,8 +2891,22 @@ done_restock: > } > } while ((memcg = parent_mem_cgroup(memcg))); > > + /* > + * Reclaim is scheduled for the userland return path already, > + * but also attempt synchronous reclaim to avoid excessive > + * overrun while the task is still inside the kernel. If this > + * is successful, the return path will see it when it rechecks > + * the overage, and simply bail out. > + * > + * Skip if the task is already dying, though. Unlike > + * memory.max, memory.high enforcement isn't as strict, and > + * there is no OOM killer involved, which means the excess > + * could already be much bigger (and still growing) than it > + * could for memory.max; the dying task could get stuck in > + * fruitless reclaim for a long time, which isn't desirable. > + */ > if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH && > - !(current->flags & PF_MEMALLOC) && > + !(current->flags & PF_MEMALLOC) && !task_is_dying() && > gfpflags_allow_blocking(gfp_mask)) { > mem_cgroup_handle_over_high(gfp_mask); > } > _ > > Patches currently in -mm which might be from hannes@xxxxxxxxxxx are > > mm-memcontrol-dont-throttle-dying-tasks-on-memoryhigh.patch >