Re: an argument for keeping oom_control in cgroups v2

Tejun Heo <tj@xxxxxxxxxx> · Mon, 22 Aug 2022 17:22:53 -1000

(cc'ing memcg folks for visiblity)

On Mon, Aug 22, 2022 at 08:04:02AM -0400, Chris Frey wrote:
> In cgroups v1 we had:
> 
> 	memory.soft_limit_in_bytes
> 	memory.limit_in_bytes
> 	memory.memsw.limit_in_bytes
> 	memory.oom_control
> 
> Using these features, we could achieve:
> 
> 	- cause programs that were memory hungry to suffer performance, but
> 	  not stop (soft limit)
> 
> 	- cause programs to swap before the system actually ran out of memory
> 	  (limit)
> 
> 	- cause programs to be OOM-killed if they used too much swap
> 	  (memsw.limit...)
> 
> 	- cause programs to halt instead of get killed (oom_control)
> 
> That last feature is something I haven't seen duplicated in the settings
> for cgroups v2.  In terms of handling a truly non-malicious memory hungry
> program, it is a feature that has no equal, because the user may require
> time to free up memory elsewhere before allocating more to the program,
> and he may not want the performance degredation, nor the loss of work,
> that comes from the other options.
> 
> Is there a reason why it wasn't included in v2?  Is there hope that it will
> come back?

memcg folks will have better answers but the short answer is that the kernel
really doesn't like giving control of a task stuck with an arbitrary
backtrace to userspace, and that kernel OOM detection often is way too late,
so cgroup2 instead goes for enabling userspace-drive OOM detection and
handling through PSI. The following doc has some information on it.

 https://facebookmicrosites.github.io/resctl-demo-website/docs/demo_docs/res_protection/oomd-daemon

FYI, systemd already has its own oomd implementation in systemd-oomd.

Thanks.

-- 
tejun