Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.

Tejun Heo <tj@xxxxxxxxxx> · Tue, 12 Jul 2022 06:24:20 -1000

Hello, Michal.

On Tue, Jul 12, 2022 at 11:52:11AM +0200, Michal Hocko wrote:
> > Agreed. That's why I don't like reparenting.
> > Reparenting just reparent the charged pages and then redirect the new
> > charge, but can't reparents the 'limit' of the original memcg.
> > So it is a risk if the original memcg is still being charged. We have
> > to forbid the destruction of the original memcg.
> 
> yes, I was toying with an idea like that. I guess we really want a
> measure to keep cgroups around if they are bound to a resource which is
> sticky itself. I am not sure how many other resources like BPF (aka
> module like) we already do charge for memcg but considering the
> potential memory consumption just reparenting will not help in general
> case I am afraid.

I think the solution here is an extra cgroup layering to represent
persistent resource tracking. In systemd-speak, a service should have a
cgroup representing a persistent service type and a cgroup representing the
current running instance. This way, the user (or system agent) can clearly
distinguish all resources that have ever been attributed to the service and
the resources that are accounted to the current instance while also giving
visibility into residual resources for services that are no longer running.

This gives userspace control over what to track for how long and also fits
what the kernel can do in terms of resource tracking. If we try to do
something smart from kernel side, there are cases which are inherently
insolvable. e.g. if a service instance creates tmpfs / shmem / whawtever and
leaves it pinned one way or another and then exits, and there's no one who
actively accessed it afterwards, there is no userland visible entity we can
reasonably attribute that memory to other than the parent cgroup.

Thanks.

-- 
tejun