On 5/30/22 14:55, Michal Hocko wrote: > On Mon 30-05-22 14:25:45, Vasily Averin wrote: >> Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on >> 4cpu VM with Fedora and self-complied upstream kernel. The calculations >> are not precise, it depends on kernel config options, number of cpus, >> enabled controllers, ignores possible page allocations etc. >> However this is enough to clarify the general situation. >> All allocations are splited into: >> - common part, always called for each cgroup type >> - per-cgroup allocations >> >> In each group we consider 2 corner cases: >> - usual allocations, important for 1-2 CPU nodes/Vms >> - percpu allocations, important for 'big irons' >> >> common part: ~11Kb + 318 bytes percpu >> memcg: ~17Kb + 4692 bytes percpu >> cpu: ~2.5Kb + 1036 bytes percpu >> cpuset: ~3Kb + 12 bytes percpu >> blkcg: ~3Kb + 12 bytes percpu >> pid: ~1.5Kb + 12 bytes percpu >> perf: ~320b + 60 bytes percpu >> ------------------------------------------- >> total: ~38Kb + 6142 bytes percpu >> currently accounted: 4668 bytes percpu >> >> - it's important to account usual allocations called >> in common part, because almost all of cgroup-specific allocations >> are small. One exception here is memory cgroup, it allocates a few >> huge objects that should be accounted. >> - Percpu allocation called in common part, in memcg and cpu cgroups >> should be accounted, rest ones are small an can be ignored. >> - KERNFS objects are allocated both in common part and in most of >> cgroups >> >> Details can be found here: >> https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@xxxxxxxxxx/ >> >> I checked other cgroups types was found that they all can be ignored. >> Additionally I found allocation of struct rt_rq called in cpu cgroup >> if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes) >> percpu structure and should be accounted too. > > One thing that the changelog is missing is an explanation why do we need > to account those objects. Users are usually not empowered to create > cgroups arbitrarily. Or at least they shouldn't because we can expect > more problems to happen. > > Could you clarify this please? The problem is actual for OS-level containers: LXC or OpenVz. They are widely used for hosting and allow to run containers by untrusted end-users. Root inside such containers is able to create groups inside own container and consume host memory without its proper accounting. Thank you, Vasily Averin