On Thu, Dec 12, 2024 at 9:07 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Thu, Dec 12, 2024 at 8:58 AM Rik van Riel <riel@xxxxxxxxxxx> wrote: > > I still think maybe this needs to be fixed on the memcg side, at least > by not making exiting tasks try really hard to reclaim memory to the > point where this becomes a problem. IIUC there could be other reasons > why reclaim may take too long, but maybe not as pathological as this > case to be fair. I will let the memcg maintainers chime in for this. FWIW, we did have some internal discussions regarding this. We think that for now, this is a good-enough stopgap solution - it remains to be seen whether other more "permanent fixes" are needed, or will not also regress other scenarios. And they are definitely more complicated than the solution Rik is proposing here :) > > If there's a fundamental reason why this cannot be fixed on the memcg > side, I don't object to this change. > > Nhat, any objections on your end? I think your fleet workloads were > the first users of this interface. Does this break their expectations? I had similar concerns as yours, so we rolled the solution to the hosts in trouble. AFAICS: 1. It allowed the pathological workload to make forward progress with the exiting procedure. 2. The other workloads (who also have memory.zswap.writeback disabled) did not observe any regression.