On 5/20/22 10:24, Vasily Averin wrote: > On 5/19/22 19:53, Michal Koutný wrote: >> On Fri, May 13, 2022 at 06:52:12PM +0300, Vasily Averin <vvs@xxxxxxxxxx> wrote: >>> Creating each new cgroup allocates 4Kb for struct cgroup. This is the >>> largest memory allocation in this scenario and is epecially important >>> for small VMs with 1-2 CPUs. >> >> What do you mean by this argument? >> >> (On bigger irons, the percpu components becomes dominant, e.g. struct >> cgroup_rstat_cpu.) > > Michal, Shakeel, > thank you very much for your feedback, it helps me understand how to improve > the methodology of my accounting analyze. > I considered the general case and looked for places of maximum memory allocations. > Now I think it would be better to split all called allocations into: > - common part, called for any cgroup type (i.e. cgroup_mkdir and cgroup_create), > - per-cgroup parts, > and focus on 2 corner cases: for single CPU VMs and for "big irons". > It helps to clarify which allocations are accounting-important and which ones > can be safely ignored. > > So right now I'm going to redo the calculations and hope it doesn't take long. common part: ~11Kb + 318 bytes percpu memcg: ~17Kb + 4692 bytes percpu cpu: ~2.5Kb + 1036 bytes percpu cpuset: ~3Kb + 12 bytes percpu blkcg: ~3Kb + 12 bytes percpu pid: ~1.5Kb + 12 bytes percpu perf: ~320b + 60 bytes percpu ------------------------------------------- total: ~38Kb + 6142 bytes percpu currently accounted: 4668 bytes percpu Results: a) I'll add accounting for cgroup_rstat_cpu and psi_group_cpu, they are called in common part and consumes 288 bytes percpu. b) It makes sense to add accounting for simple_xattr(), as Michal recommend, especially because it can grow over 4kb c) it looks like the rest of the allocations can be ignored Details are below ('=' -- already accounted, '+' -- to be accounted, '~' -- see KERNFS, '?' -- perhaps later ) common part: 16 ~ 352 5632 5632 KERNFS (*) 1 + 4096 4096 9728 (cgroup_mkdir+0xe4) 1 584 584 10312 (radix_tree_node_alloc.constprop.0+0x89) 1 192 192 10504 (__d_alloc+0x29) 2 72 144 10648 (avc_alloc_node+0x27) 2 64 128 10776 (percpu_ref_init+0x6a) 1 64 64 10840 (memcg_list_lru_alloc+0x21a) 1 + 192 192 192 call_site=psi_cgroup_alloc+0x1e 1 + 96 96 288 call_site=cgroup_rstat_init+0x5f 2 12 24 312 call_site=percpu_ref_init+0x23 1 6 6 318 call_site=__percpu_counter_init+0x22 (*) KERNFS includes: 1 + 128 (__kernfs_new_node+0x4d) kernfs node 1 + 88 (__kernfs_iattrs+0x57) kernfs iattrs 1 + 96 (simple_xattr_alloc+0x28) simple_xattr_alloc() that can grow over 4Kb 1 ? 32 (simple_xattr_set+0x59) 1 8 (__kernfs_new_node+0x30) memory: ------ 1 + 8192 8192 8192 (mem_cgroup_css_alloc+0x4a) 14 ~ 352 4928 13120 KERNFS 1 + 2048 2048 15168 (mem_cgroup_css_alloc+0xdd) 1 1024 1024 16192 (alloc_shrinker_info+0x79) 1 584 584 16776 (radix_tree_node_alloc.constprop.0+0x89) 2 64 128 16904 (percpu_ref_init+0x6a) 1 64 64 16968 (mem_cgroup_css_online+0x32) 1 = 3684 3684 3684 call_site=mem_cgroup_css_alloc+0x9e 1 = 984 984 4668 call_site=mem_cgroup_css_alloc+0xfd 2 12 24 4692 call_site=percpu_ref_init+0x23 cpu: --- 5 ~ 352 1760 1760 KERNFS 1 640 640 2400 (sched_create_group+0x1b) 1 64 64 2464 (percpu_ref_init+0x6a) 1 32 32 2496 (alloc_fair_sched_group+0x55) 1 32 32 2528 (alloc_fair_sched_group+0x31) 4 + 512 512 512 (alloc_fair_sched_group+0x16c) 4 + 512 512 1024 (alloc_fair_sched_group+0x13e) 1 12 12 1036 call_site=percpu_ref_init+0x23 cpuset: ------ 5 ~ 352 1760 1760 KERNFS 1 1024 1024 2784 (cpuset_css_alloc+0x2f) 1 64 64 2848 (percpu_ref_init+0x6a) 3 8 24 2872 (alloc_cpumask_var_node+0x1f) 1 12 12 12 call_site=percpu_ref_init+0x23 blkcg: ----- 6 ~ 352 2112 2112 KERNFS 1 512 512 2624 (blkcg_css_alloc+0x37) 1 64 64 2688 (percpu_ref_init+0x6a) 1 32 32 2720 (ioprio_alloc_cpd+0x39) 1 32 32 2752 (ioc_cpd_alloc+0x39) 1 32 32 2784 (blkcg_css_alloc+0x66) 1 12 12 12 call_site=percpu_ref_init+0x23 pid: --- 3 ~ 352 1056 1056 KERNFS 1 512 512 1568 (pids_css_alloc+0x1b) 1 64 64 1632 (percpu_ref_init+0x6a) 1 12 12 12 call_site=percpu_ref_init+0x23 perf: ---- 1 256 256 256 (perf_cgroup_css_alloc+0x1c) 1 64 64 320 (percpu_ref_init+0x6a) 1 48 48 48 call_site=perf_cgroup_css_alloc+0x33 1 12 12 60 call_site=percpu_ref_init+0x23 Thank you, Vasily Averin