On 12/12/24 02:53, Rik van Riel wrote: > A task already in exit can get stuck trying to allocate pages, if its > cgroup is at the memory.max limit, the cgroup is using zswap, but > zswap writeback is enabled, and the remaining memory in the cgroup is > not compressible. > > This seems like an unlikely confluence of events, but it can happen > quite easily if a cgroup is OOM killed due to exceeding its memory.max > limit, and all the tasks in the cgroup are trying to exit simultaneously. > > When this happens, it can sometimes take hours for tasks to exit, > as they are all trying to squeeze things into zswap to bring the group's > memory consumption below memory.max. > > Allowing these exiting programs to push some memory from their own > cgroup into swap allows them to quickly bring the cgroup's memory > consumption below memory.max, and exit in seconds rather than hours. > > Loading this fix as a live patch on a system where a workload got stuck > exiting allowed the workload to exit within a fraction of a second. > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> > --- > mm/memcontrol.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7b3503d12aaf..03d77e93087e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5371,6 +5371,15 @@ bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > if (!zswap_is_enabled()) > return true; > > + /* > + * Always allow exiting tasks to push data to swap. A process in > + * the middle of exit cannot get OOM killed, but may need to push > + * uncompressible data to swap in order to get the cgroup memory > + * use below the limit, and make progress with the exit. > + */ > + if ((current->flags & PF_EXITING) && memcg == mem_cgroup_from_task(current)) > + return true; > + > for (; memcg; memcg = parent_mem_cgroup(memcg)) > if (!READ_ONCE(memcg->zswap_writeback)) > return false; Rik, I am unable to understand the motivation here, so we want mem_cgroup_zswap_writeback_enabled() to return true, it only returns false if a memcg in the hierarchy has zswap_writeback set to 0 (false). In my git-grep I can't seem to find how/why that may be the case. I can see memcg starts of with the value set to true, if CONFIG_ZSWAP is enabled. Your changelog above makes sense, but I am unable to map it to the code changes. Balbir