> More concretely, these are workloads which used to completely occupy a > single machine, though within containers but without limits. These > workloads used to look at machine level metrics at startup on how much > resources are available. I've been there but haven't found convincing mapping of global to memcg limits. The issue is that such a value won't guarantee no OOM when below because it can be (generally) effectively shared. (Alas, apps typically don't express their memory needs in units of PSI. So it boils down to a system wide monitor like systemd-oomd and cooperation with it.) > Now these workloads are being moved to multi-tenant environment but > still the machine is partitioned statically between the workloads. So, > these workloads need to know upfront how much resources are allocated to > them upfront and the way the cgroup hierarchy is setup, that information > is a bit above the tree. FTR, e.g. in systemd setups, this can be partially overcome by exposed EffectiveMemoryMax= (the service manager who configures the resources also can do the ancestry traversal). kubernetes has downward API where generic resource info is shared into containers and I recall that lxcfs could mangle procfs memory info wrt memory limits for legacy apps. As I think about it, the cgroupns (in)visibility should be resolved by assigning the proper limit to namespace's root group memory.max (read only for contained user) and the traversal... On Thu, Feb 06, 2025 at 11:37:31AM -0800, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote: > but having a single file to read instead of walking up the > tree with multiple reads to calculate an effective limit would be > nice. ...in kernel is nice but possible performance gain isn't worth hiding the shareability of the effective limit. So I wonder what is the current PoV of more MM people... Michal