On Thu, 2024-12-12 at 09:51 -0800, Shakeel Butt wrote: > > The fundamental issue is that the exiting process (killed by oomd or > simple exit) has to allocated memory but the cgroup is at limit and > the > reclaim is very very slow. > > I can see attacking this issue with multiple angles. Besides your proposed ideas, I suppose we could also limit the gfp_mask of an exiting reclaimer with eg. __GFP_NORETRY, but I do not know how effective that would be, since a single pass through the memory reclaim code was still taking dozens of seconds when I traced the "stuck" workloads. -- All Rights Reversed.