On Thu, Jul 20, 2023 at 3:12 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, > > On Thu, Jul 20, 2023 at 02:34:16PM -0700, Yosry Ahmed wrote: > > > Or just create a nesting layer so that there's a cgroup which represents the > > > persistent resources and a nested cgroup instance inside representing the > > > current instance. > > > > In practice it is not easy to know exactly which resources are shared > > and used by which cgroups, especially in a large dynamic environment. > > Yeah, that only covers when resource persistence is confined in a known > scope. That said, I have a hard time seeing how recharding once after cgroup > destruction can be a solution for the situations you describe. What if A > touches it once first, B constantly uses it but C only very occasionally and > after A dies C ends up owning it due to timing. This is very much possible > in a large dynamic environment but neither the initial or final situation is > satisfactory. That is indeed possible, but it would be more likely that the charge is moved to B. As I said, it's not perfect, but it is an improvement over what we have today. Even if C ends up owning it, it's better than staying with the dead A. > > To solve the problems you're describing, you actually would have to > guarantee that memory pages are charged to the current majority user (or > maybe even spread across current active users). Maybe it can be argued that > this is a step towards that but it's a very partial step and at least would > need a technically viable direction that this development can follow. Right, that would be a much larger effort (arguably memcg v3 ;) ). This proposal is focused on the painful artifact of the sharing/sticky resources problem: zombie memcgs. We can extend the automatic charge movement semantics later to cover more cases or be smarter, or ditch the existing charging semantics completely and start over with sharing/stickiness in mind. Either way, that would be a long-term effort. There is a problem that exists today though that ideally can be fixed/improved by this proposal. > > On its own, AFAICS, I'm not sure the scope of problems it can actually solve > is justifiably greater than what can be achieved with simple nesting. In our use case nesting is not a viable option. As I said, in a large fleet where a lot of different workloads are dynamically being scheduled on different machines, and where there is no way of knowing what resources are being shared among what workloads, and even if we do, it wouldn't be constant, it's very difficult to construct the hierarchy with nesting to keep the resources confined. Keep in mind that the environment is dynamic, workloads are constantly coming and going. Even if find the perfect nesting to appropriately scope resources, some rescheduling may render the hierarchy obsolete and require us to start over. > > Thanks. > > -- > tejun