On Mon, Feb 10, 2025 at 05:24:17PM +0100, Michal Koutný wrote: > Hello. > > On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > Oh I totally forgot about your series. In my use-case, it is not about > > dynamically knowning how much they can expand and adjust themselves but > > rather knowing statically upfront what resources they have been given. > > From the memcg PoV, the effective value doesn't tell how much they were > given (because of sharing). It's definitely true that if you have an ancestral limit for several otherwise unlimited siblings, then interpreting this number as "this is how much memory I have available" will be completely misleading. I would also say that sharing a limit with several siblings requires a certain degree of awareness and cooperation between them. From that POV, IMO it would be fine to provide a metric with contextual caveats. The problem is, what do we do with canned, unaware, maybe untrusted applications? And they don't necessarily know which they are. It depends heavily on the judgement of the administrator of any given deployment. Some workloads might be completely untrusted and hard limited. Another deployment might consider the same workload reasonably predictable that it's configured only with a failsafe max limit that is much higher than where the workload is *expected* to operate. The allotment might happen altogether with min/low protections and no max limit. Or there could be a combination of protection slightly below and a limit slightly above the expected workload size. It seems basically impossible to write portable code against this without knowing the intent of the person setting it up. But how do we communicate intent down to the container? The two broad options are implicitly or explicitly: a) Provide a cgroup file that automatically derives intended target size from how min/low/high/max are set up. Right now those can be set up super loosely depending on what the administrator thinks about the application. In order for this to work, we'd likely have to define an idiomatic way of configuring the controller. E.g. if you set max by itself, we assume this is the target size. If you set low, with or without max, then low is the target size. Or if you set both, target is in between. I'm not completely convinced this is workable. It might require settings beyond what's actually needed for the safe containment of the workload, which carries the risk of excluding something useful. I don't mean enforced configuration rules, but rather the case where a configuration is reasonable and effective given the workload and environment, but now the target file shows nonsense. b) Provide a cgroup file that is freely configurable by the administrator with the target size of the container. This has obvious drawbacks as well. What's the default value? Also, a lot of setups are dead simple: set a hard limit and expect the workload to adhere to that, period. Nobody is going to reliably set another cgroup file that a workload may or may not consume. The third option is to wash our hands of all of this, provide the static hierarchy settings to the leaves (like this patch, plus do it for the other knobs as well) and let userspace figure it out. Thoughts?