Re: [PATCH] Documentation: Clarify usage of memory limits

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 1 Jun 2023 15:53:45 -0400

On Thu, Jun 01, 2023 at 03:15:28PM -0400, Waiman Long wrote:
> On 6/1/23 14:38, Dan Schatzberg wrote:
> > The existing documentation refers to memory.high as the "main mechanism
> > to control memory usage." This seems incorrect to me - memory.high can
> > result in reclaim pressure which simply leads to stalls unless some
> > external component observes and actions on it (e.g. systemd-oomd can be
> > used for this purpose). While this is feasible, users are unaware of
> > this interaction and are led to believe that memory.high alone is an
> > effective mechanism for limiting memory.
> > 
> > The documentation should recommend the use of memory.max as the
> > effective way to enforce memory limits - it triggers reclaim and results
> > in OOM kills by itself.
> 
> That is not how my understanding of memory.high works. When memory usage
> goes past memory.high, memory reclaim will be initiated to reclaim the
> memory back. Stall happens when memory.usage keep increasing like by
> consuming memory faster than what memory reclaim can recover. When
> memory.max is reached, OOM killer will then kill off the tasks.

This was the initial plan indeed: Slow down the workload and thus slow
the growth; hope that the workload recovers with voluntary frees; set
memory.max as a safety if it keeps going beyond.

This never panned out. Once workloads are stuck, they might not back
down on their own. By increasingly slowing growth, it becomes harder
and harder for them to reach the memory.max intervention point.

It's a very brittle configuration strategy. Unless you very carefully
calibrate memory.high and memory.max together with awareness of the
throttling algorithm, workloads that hit memory.high will just go to
sleep indefinitely. They require outside intervention that either
adjusts limits or implements kill policies based on observed sleeps
(they're reported as pressure via psi).

So the common usecases today end up being that memory.max is for
enforcing kernel OOM kills, and memory.high is a tool to implement
userspace OOM killing policies.

Dan is right to point out the additional expectations for userspace
management when memory.high is in used. And memory.max is still the
primary, works-out-of-the-box method of memory containment.