Dear Michal, do you still have any concerns about this patch set? Thank you, Vasily Averin On 6/23/22 17:50, Vasily Averin wrote: > In some cases, creating a cgroup allocates a noticeable amount of memory. > This operation can be executed from inside memory-limited container, > but currently this memory is not accounted to memcg and can be misused. > This allow container to exceed the assigned memory limit and avoid > memcg OOM. Moreover, in case of global memory shortage on the host, > the OOM-killer may not find a real memory eater and start killing > random processes on the host. > > This is especially important for OpenVZ and LXC used on hosting, > where containers are used by untrusted end users. > > Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on > 4cpu VM with Fedora and self-complied upstream kernel. The calculations > are not precise, it depends on kernel config options, number of cpus, > enabled controllers, ignores possible page allocations etc. > However this is enough to clarify the general situation. > All allocations are splitted into: > - common part, always called for each cgroup type > - per-cgroup allocations > > In each group we consider 2 corner cases: > - usual allocations, important for 1-2 CPU nodes/Vms > - percpu allocations, important for 'big irons' > > common part: ~11Kb + 318 bytes percpu > memcg: ~17Kb + 4692 bytes percpu > cpu: ~2.5Kb + 1036 bytes percpu > cpuset: ~3Kb + 12 bytes percpu > blkcg: ~3Kb + 12 bytes percpu > pid: ~1.5Kb + 12 bytes percpu > perf: ~320b + 60 bytes percpu > ------------------------------------------- > total: ~38Kb + 6142 bytes percpu > currently accounted: 4668 bytes percpu > > - it's important to account usual allocations called > in common part, because almost all of cgroup-specific allocations > are small. One exception here is memory cgroup, it allocates a few > huge objects that should be accounted. > - Percpu allocation called in common part, in memcg and cpu cgroups > should be accounted, rest ones are small an can be ignored. > - KERNFS objects are allocated both in common part and in most of > cgroups > > Details can be found here: > https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@xxxxxxxxxx/ > > I checked other cgroups types was found that they all can be ignored. > Additionally I found allocation of struct rt_rq called in cpu cgroup > if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes) > percpu structure and should be accounted too. > > v5: > 1) re-based to linux-mm (mm-everything-2022-06-22-20-36) > > v4: > 1) re-based to linux-next (next-20220610) > now psi_group is not a part of struct cgroup and is allocated on demand > 2) added received approval from Muchun Song > 3) improved cover letter description according to akpm@ request > > v3: > 1) re-based to current upstream (v5.18-11267-gb00ed48bb0a7) > 2) fixed few typos > 3) added received approvals > > v2: > 1) re-split to simplify possible bisect, re-ordered > 2) added accounting for percpu psi_group_cpu and cgroup_rstat_cpu, > allocated in common part > 3) added accounting for percpu allocation of struct rt_rq > (actual if CONFIG_RT_GROUP_SCHED is enabled) > 4) improved patches descriptions > > Vasily Averin (9): > memcg: enable accounting for struct cgroup > memcg: enable accounting for kernfs nodes > memcg: enable accounting for kernfs iattrs > memcg: enable accounting for struct simple_xattr > memcg: enable accounting for percpu allocation of struct psi_group_cpu > memcg: enable accounting for percpu allocation of struct > cgroup_rstat_cpu > memcg: enable accounting for large allocations in mem_cgroup_css_alloc > memcg: enable accounting for allocations in alloc_fair_sched_group > memcg: enable accounting for perpu allocation of struct rt_rq > > fs/kernfs/mount.c | 6 ++++-- > fs/xattr.c | 2 +- > kernel/cgroup/cgroup.c | 2 +- > kernel/cgroup/rstat.c | 3 ++- > kernel/sched/fair.c | 4 ++-- > kernel/sched/psi.c | 2 +- > kernel/sched/rt.c | 2 +- > mm/memcontrol.c | 4 ++-- > 8 files changed, 14 insertions(+), 11 deletions(-) >