Hello, Michal. On Tue, Jul 12, 2022 at 11:52:11AM +0200, Michal Hocko wrote: > > Agreed. That's why I don't like reparenting. > > Reparenting just reparent the charged pages and then redirect the new > > charge, but can't reparents the 'limit' of the original memcg. > > So it is a risk if the original memcg is still being charged. We have > > to forbid the destruction of the original memcg. > > yes, I was toying with an idea like that. I guess we really want a > measure to keep cgroups around if they are bound to a resource which is > sticky itself. I am not sure how many other resources like BPF (aka > module like) we already do charge for memcg but considering the > potential memory consumption just reparenting will not help in general > case I am afraid. I think the solution here is an extra cgroup layering to represent persistent resource tracking. In systemd-speak, a service should have a cgroup representing a persistent service type and a cgroup representing the current running instance. This way, the user (or system agent) can clearly distinguish all resources that have ever been attributed to the service and the resources that are accounted to the current instance while also giving visibility into residual resources for services that are no longer running. This gives userspace control over what to track for how long and also fits what the kernel can do in terms of resource tracking. If we try to do something smart from kernel side, there are cases which are inherently insolvable. e.g. if a service instance creates tmpfs / shmem / whawtever and leaves it pinned one way or another and then exits, and there's no one who actively accessed it afterwards, there is no userland visible entity we can reasonably attribute that memory to other than the parent cgroup. Thanks. -- tejun