On 5/3/22 00:22, Michal Koutný wrote: > When struct mem_cgroup charging was introduced, there was a similar > discussion [1]. Thank you, I'm missed this patch, it was very interesting and useful. I would note though, that OpenVZ and LXC have another usecase: we have separate and independent systemd instances inside OS containers. So container's cgroups are created not in host's root memcg but inside accountable container's root memcg. > I can see following aspects here: > 1) absolute size of kernfs_objects, > 2) practical difference between a) and b), > 3) consistency with memcg, > 4) v1 vs v2 behavior. ... > How do these reasonings align with your original intention of net > devices accounting? (Are the creators of net devices inside the > container?) It is possible to create netdevice in one namespace/container and then move them to another one, and this possibility is widely used. With my patch memory allocated by these devices will be not accounted to new memcg, however I do not think it is a problem. My patches protect the host mostly from misuse, when someone creates a huge number of nedevices inside a container. >> Do you think it is incorrect and new kernfs node should be accounted >> to memcg of parent cgroup, as mem_cgroup_css_alloc()-> mem_cgroup_alloc() does? > > I don't think either variant is incorrect. I'd very much prefer the > consistency with memcg behavior (variant a)) but as I've listed the > arguments above, it seems such a consistency can't be easily justified. >From my point of view it is most important to account allocated memory to any cgroup inside container. Select of proper memcg is a secondary goal here. Frankly speaking I do not see a big difference between memcg of current process, memcg of newly created child and memcg of its parent. As far as I understand, Roman chose the parent memcg because it was a special case of creating a new memory group. He temporally changed active memcg in mem_cgroup_css_alloc() and properly accounted all required memcg-specific allocations. However, he ignored accounting for a rather large struct mem_cgroup therefore I think we can do not worry about 128 bytes of kernfs node. Yes, it will be accounted to some other memcg, but the same thing happens with kernfs nodes of other groups. I don't think that's a problem. >> Perhaps you mean that in this case kernfs should not be counted at all, >> as almost all neighboring allocations do? > > No, I think it wouldn't help here [2]. (Or which neighboring allocations > do you mean? There must be at least nr_cgroups of them.) Primary I mean here struct mem_cgroup allocation in mem_cgroup_alloc(). However, I think we need to take into account any other distributions called inside cgroup_mkdir: struct cgroup and kernefs node in common part and any other cgroup-cpecific allocations in other .css_alloc functions. They all can be called from inside container, allocates non-accountable memory and by this way theoretically can be misused. So I'm going to check this scenario a bit later. Thank you, Vasily Averin