Hello. On Mon, May 02, 2022 at 10:37:49PM +0300, Vasily Averin <vvs@xxxxxxxxxx> wrote: > I did not understand your statement. Could you please explain it in more details? Sure, let me expand my perhaps ambiguous and indefinite sentence. > I see that cgroup_mkdir()->cgroup_create() creates new kernfs node for new > sub-directory, and with my patch account memory of kernfs node to memcg > of current process. Indeed. The variants I'm comparing here are: a) charge to the creator's memcg, b) charge to the parent (memcg ancestor) of created cgroup. When struct mem_cgroup charging was introduced, there was a similar discussion [1]. I can see following aspects here: 1) absolute size of kernfs_objects, 2) practical difference between a) and b), 3) consistency with memcg, 4) v1 vs v2 behavior. Ad 1) -- normally, I'd treat this as negligible (~120B struct kernfs_node * there are ~10 of them per subsys * ~10 subsystems ~ 12 KB/cgroup). But I guess the point of this change are exploitative users where this doesn't hold [2], so absolute size is not so important. Ad 2) -- in the typical workloads, only top-level cgroup are created by some management entity and lower level are managed from within, i.e. there is little difference whom to charge the created objects. Ad 3) -- struct mem_cgroup objects are charged to their hierarchical parent, so that dying memcgs can be associated to a subtree which is where the reclaim can deal with it (in contrast with creator's cgroup). Now, if I'm looking correctly, the kernfs_node objects are not pinned by any residual state (subsystems kill_css()->css_clear_dir() synchronously from rmdir, cgroup itself may be RCU delayed). So the memcg argument remains purely for consistency (but no practical reason). Ad 4) -- the variant b) becomes slightly awkward when mkdir'ing a cgroup in a non-memcg hierarchy (bubbles up to root, despite creator in a non-root memcg). How do these reasonings align with your original intention of net devices accounting? (Are the creators of net devices inside the container?) > Do you think it is incorrect and new kernfs node should be accounted > to memcg of parent cgroup, as mem_cgroup_css_alloc()-> mem_cgroup_alloc() does? I don't think either variant is incorrect. I'd very much prefer the consistency with memcg behavior (variant a)) but as I've listed the arguments above, it seems such a consistency can't be easily justified. > Perhaps you mean that in this case kernfs should not be counted at all, > as almost all neighboring allocations do? No, I think it wouldn't help here [2]. (Or which neighboring allocations do you mean? There must be at least nr_cgroups of them.) (Of course, then there's the traditional performance argument, cgroup's kernfs_node object shouldn't be problematic but I can't judge others (sysfs) but that's nothing to prevent any form of kernfs_node accounting going forward in my eyes.) HTH, Michal [1] https://lore.kernel.org/all/20200729171039.GA22229@xxxxxxxxxxxxxxxxx/ [2] Unless this could be constraint by something even bigger and accounted. But only struct mem_cgroup (recursively its percpu stats) comes to my mind.