Hello, On Fri, Aug 19, 2022 at 08:59:20AM +0800, Yafang Shao wrote: > On Fri, Aug 19, 2022 at 6:20 AM Tejun Heo <tj@xxxxxxxxxx> wrote: > > memcg folks would have better informed opinions but from generic cgroup pov > > I don't think this is a good direction to take. This isn't a problem limited > > to bpf progs and it doesn't make whole lot of sense to solve this for bpf. > > This change is bpf specific. It doesn't refactor a whole lot of things. I'm not sure what point the above sentence is making. It may not change a lot of code but it does introduce significantly new mode of operation which affects memcg and cgroup in general. > > We have the exact same problem for any resources which span multiple > > instances of a service including page cache, tmpfs instances and any other > > thing which can persist longer than procss life time. My current opinion is > > that this is best solved by introducing an extra cgroup layer to represent > > the persistent entity and put the per-instance cgroup under it. > > It is not practical on k8s. > Because, before the persistent entity, the cgroup dir is stateless. > After, it is stateful. > Pls, don't continue keeping blind eyes on k8s. Can you please elaborate why it isn't practical for k8s? I don't know the details of k8s and what you wrote above is not a detailed enough technical argument. > > It does require reorganizing how things are organized from userspace POV but > > the end result is really desirable. We get entities accurately representing > > what needs to be tracked and control over the granularity of accounting and > > control (e.g. folks who don't care about telling apart the current > > instance's usage can simply not enable controllers at the persistent entity > > level). > > Pls.s also think about why k8s refuse to use cgroup2. This attitude really bothers me. You aren't spelling it out fully but instead of engaging in the technical argument at the hand, you're putting forth conforming upstream to the current k8s's assumptions and behaviors as a requirement and then insisting that it's upstream's fault that k8s is staying with cgroup1. This is not an acceptable form of argument and it would be irresponsible to grant any kind weight to this line of reasoning. k8s may seem like the world to you but it is one of many use cases of the upstream kernel. We all should pay attention to the use cases and technical arguments to determine how we chart our way forward, but being k8s or whoever else clearly isn't a waiver to claim this kind of unilateral demand. It's okay to emphasize the gravity of the specific use case at hand but please realize that it's one of the many factors that should be considered and sometimes one which can and should get trumped by others. Thanks. -- tejun