Re: [PATCH 3/4] memcg: enable accounting for struct cgroup

Roman Gushchin <roman.gushchin@xxxxxxxxx> · Fri, 20 May 2022 17:55:40 -0700

On Fri, May 20, 2022 at 11:16:32PM +0300, Vasily Averin wrote:
> On 5/20/22 10:24, Vasily Averin wrote:
> > On 5/19/22 19:53, Michal Koutný wrote:
> >> On Fri, May 13, 2022 at 06:52:12PM +0300, Vasily Averin <vvs@xxxxxxxxxx> wrote:
> >>> Creating each new cgroup allocates 4Kb for struct cgroup. This is the
> >>> largest memory allocation in this scenario and is epecially important
> >>> for small VMs with 1-2 CPUs.
> >>
> >> What do you mean by this argument?
> >>
> >> (On bigger irons, the percpu components becomes dominant, e.g. struct
> >> cgroup_rstat_cpu.)
> > 
> > Michal, Shakeel,
> > thank you very much for your feedback, it helps me understand how to improve
> > the methodology of my accounting analyze.
> > I considered the general case and looked for places of maximum memory allocations.
> > Now I think it would be better to split all called allocations into:
> > - common part, called for any cgroup type (i.e. cgroup_mkdir and cgroup_create),
> > - per-cgroup parts,
> > and focus on 2 corner cases: for single CPU VMs and for "big irons".
> > It helps to clarify which allocations are accounting-important and which ones
> > can be safely ignored.
> > 
> > So right now I'm going to redo the calculations and hope it doesn't take long.
> 
> common part: 	~11Kb	+  318 bytes percpu
> memcg: 		~17Kb	+ 4692 bytes percpu
> cpu:		~2.5Kb	+ 1036 bytes percpu
> cpuset:		~3Kb	+   12 bytes percpu
> blkcg:		~3Kb	+   12 bytes percpu
> pid:		~1.5Kb	+   12 bytes percpu		
> perf:		 ~320b	+   60 bytes percpu
> -------------------------------------------
> total:		~38Kb	+ 6142 bytes percpu
> currently accounted:	  4668 bytes percpu
> 
> Results:
> a) I'll add accounting for cgroup_rstat_cpu and psi_group_cpu,
> they are called in common part and consumes 288 bytes percpu.
> b) It makes sense to add accounting for simple_xattr(), as Michal recommend,
>  especially because it can grow over 4kb
> c) it looks like the rest of the allocations can be ignored
> 
> Details are below
> ('=' -- already accounted, '+' -- to be accounted, '~' -- see KERNFS, '?' -- perhaps later )
> 
> common part:
> 16  ~   352     5632    5632    KERNFS (*)
> 1   +   4096    4096    9728    (cgroup_mkdir+0xe4)
> 1       584     584     10312   (radix_tree_node_alloc.constprop.0+0x89)
> 1       192     192     10504   (__d_alloc+0x29)
> 2       72      144     10648   (avc_alloc_node+0x27)
> 2       64      128     10776   (percpu_ref_init+0x6a)
> 1       64      64      10840   (memcg_list_lru_alloc+0x21a)
> 
> 1   +   192     192     192     call_site=psi_cgroup_alloc+0x1e
> 1   +   96      96      288     call_site=cgroup_rstat_init+0x5f
> 2       12      24      312     call_site=percpu_ref_init+0x23
> 1       6       6       318     call_site=__percpu_counter_init+0x22

I'm curios, how do you generate these data?
Just an idea: it could be a nice tool, placed somewhere in tools/cgroup/...

Thanks!