Re: [PATCH] memcg: allow exiting tasks to write back data to swap

Rik van Riel <riel@xxxxxxxxxxx> · Wed, 11 Dec 2024 12:49:24 -0500

On Wed, 2024-12-11 at 09:30 -0800, Yosry Ahmed wrote:
> On Wed, Dec 11, 2024 at 9:20 AM Rik van Riel <riel@xxxxxxxxxxx>
> wrote:
> > 
> > On Wed, 2024-12-11 at 09:00 -0800, Yosry Ahmed wrote:
> > > On Wed, Dec 11, 2024 at 8:34 AM Rik van Riel <riel@xxxxxxxxxxx>
> > > wrote:
> > > > > 
> > > > If it is a kernel directed memcg OOM kill, that is
> > > > true.
> > > > 
> > > > However, if the exit comes from somewhere else,
> > > > like a userspace oomd kill, we might not hit that
> > > > code path.
> > > 
> > > Why do we treat dying tasks differently based on the source of
> > > the
> > > kill?
> > > 
> > Are you saying we should fail allocations for
> > every dying task, and add a check for PF_EXITING
> > in here?
> 
> I am asking, not really suggesting anything :)
> 
> Does it matter from the kernel perspective if the task is dying due
> to
> a kernel OOM kill or a userspace SIGKILL?
> 
Currently, it does. I'm not sure it should, but
currently it does :/

We are dealing with two conflicting demands here.

On the one hand, we want the exit code to be able
to access things like futex memory, so it can
properly clean up everything the program left behind.

On the other hand, we don't want the exiting
program to drive up cgroup memory use, especially
not with memory that won't be reclaimed by the
exit.

My patch is an attempt to satisfy both of these
demands, in situations where we currently exhibit
a rather pathological behavior (glacially slow
exit).

-- 
All Rights Reversed.