Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap

Nhat Pham <nphamcs@xxxxxxxxx> · Thu, 12 Dec 2024 10:18:51 -0800

On Thu, Dec 12, 2024 at 10:03 AM Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> On Thu, 2024-12-12 at 09:51 -0800, Shakeel Butt wrote:
> >
> > The fundamental issue is that the exiting process (killed by oomd or
> > simple exit) has to allocated memory but the cgroup is at limit and
> > the
> > reclaim is very very slow.
> >
> > I can see attacking this issue with multiple angles.
>
> Besides your proposed ideas, I suppose we could also limit
> the gfp_mask of an exiting reclaimer with eg. __GFP_NORETRY,
> but I do not know how effective that would be, since a single
> pass through the memory reclaim code was still taking dozens
> of seconds when I traced the "stuck" workloads.

I know we already discussed this, but it'd be nice if we can let the
exiting task go ahead with the page fault and bypass the memory
limits, if the page fault is crucial for it to make forward progress.
Not sure how feasible that is, and how to decide which page fault is
really crucial though :)

For the pathological memory.zswap.writeback disabling case in
particular, another thing we can do here is to make these
incompressible pages ineligible for further reclaim attempt, either by
putting them on a non-reclaim LRU, or putting them in the zswap LRU to
maintain total ordering of the LRUs. That way we can move on to other
sources (slab caches for example) quicker, or fail earlier? That said,
it remains to be seen what will happen if these incompressible pages
are literally all that are left...?

I'm biased to this idea though, because they have other benefits.
Maybe I'm just looking for excuses to revive the project ;)