On 6/23/22 19:55, Shakeel Butt wrote: > On Thu, Jun 23, 2022 at 9:07 AM Michal Hocko <mhocko@xxxxxxxx> wrote: >> >> On Thu 23-06-22 18:03:31, Vasily Averin wrote: >>> Dear Michal, >>> do you still have any concerns about this patch set? >> >> Yes, I do not think we have concluded this to be really necessary. IIRC >> Roman would like to see lingering cgroups addressed in not-so-distant >> future (http://lkml.kernel.org/r/Ypd2DW7id4M3KJJW@carbon) and we already >> have a limit for the number of cgroups in the tree. So why should we >> chase after allocations that correspond the cgroups and somehow try to >> cap their number via the memory consumption. This looks like something >> that will get out of sync eventually and it also doesn't seem like the >> best control to me (comparing to an explicit limit to prevent runaways). >> -- > > Let me give a counter argument to that. On a system running multiple > workloads, how can the admin come up with a sensible limit for the > number of cgroups? There will definitely be jobs that require much > more number of sub-cgroups. Asking the admins to dynamically tune > another tuneable is just asking for more complications. At the end all > the users would just set it to max. > > I would recommend to see the commit ac7b79fd190b ("inotify, memcg: > account inotify instances to kmemcg") where there is already a sysctl > (inotify/max_user_instances) to limit the number of instances but > there was no sensible way to set that limit on a multi-tenant system. I've found that MEM_CGROUP_ID_MAX limits memory cgroups only. Other types of cgroups do not have similar restrictions. Yes, we can set some per-container limit for all cgroups, but to me it looks like workaround while proper memory accounting looks like real solution. Btw could you please explain why memory cgroups have MEM_CGROUP_ID_MAX limit Why it is required at all and why it was set to USHRT_MAX? I believe that in the future it may be really reachable: Let's set up per-container cgroup limit to some small numbers, for example to 512 as OpenVz doing right now. On real node with 300 containers we can easily get 100*300 = 30000 cgroups, and consume ~3Gb memory, without any misuse. I think it is too much to ignore its accounting. Thank you, Vasily Averin