Re: [RFC PATCH 0/8] memory recharging for offline memcgs

Tejun Heo <tj@xxxxxxxxxx> · Fri, 21 Jul 2023 08:26:28 -1000

Hello,

On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote:
> On Thu, Jul 20, 2023 at 3:31 PM Tejun Heo <tj@xxxxxxxxxx> wrote:
> > memory at least in our case. The sharing across them comes down to things
> > like some common library pages which don't really account for much these
> > days.
> 
> Keep in mind that even a single page charged to a memcg and used by
> another memcg is sufficient to result in a zombie memcg.

I mean, yeah, that's a separate issue or rather a subset which isn't all
that controversial. That can be deterministically solved by reparenting to
the parent like how slab is handled. I think the "deterministic" part is
important here. As you said, even a single page can pin a dying cgroup.

> > > Keep in mind that the environment is dynamic, workloads are constantly
> > > coming and going. Even if find the perfect nesting to appropriately
> > > scope resources, some rescheduling may render the hierarchy obsolete
> > > and require us to start over.
> >
> > Can you please go into more details on how much memory is shared for what
> > across unrelated dynamic workloads? That sounds different from other use
> > cases.
> 
> I am trying to collect more information from our fleet, but the
> application restarting in a different cgroup is not what is happening
> in our case. It is not easy to find out exactly what is going on on

This is the point that Johannes raised but I don't think the current
proposal would make things more deterministic. From what I can see, it
actually pushes it towards even less predictability. Currently, yeah, some
pages may end up in cgroups which aren't the majority user but it at least
is clear how that would happen. The proposed change adds layers of
indeterministic behaviors on top. I don't think that's the direction we want
to go.

> machines and where the memory is coming from due to the
> indeterministic nature of charging. The goal of this proposal is to
> let the kernel handle leftover memory in zombie memcgs because it is
> not always obvious to userspace what's going on (like it's not obvious
> to me now where exactly is the sharing happening :) ).
>
> One thing to note is that in some cases, maybe a userspace bug or
> failed cleanup is a reason for the zombie memcgs. Ideally, this
> wouldn't happen, but it would be nice to have a fallback mechanism in
> the kernel if it does.

I'm not disagreeing on that. Our handling of pages owned by dying cgroups
isn't great but I don't think the proposed change is an acceptable solution.

Thanks.

-- 
tejun