On Thu, Jun 01, 2023 at 03:15:28PM -0400, Waiman Long wrote: > On 6/1/23 14:38, Dan Schatzberg wrote: > > The existing documentation refers to memory.high as the "main mechanism > > to control memory usage." This seems incorrect to me - memory.high can > > result in reclaim pressure which simply leads to stalls unless some > > external component observes and actions on it (e.g. systemd-oomd can be > > used for this purpose). While this is feasible, users are unaware of > > this interaction and are led to believe that memory.high alone is an > > effective mechanism for limiting memory. > > > > The documentation should recommend the use of memory.max as the > > effective way to enforce memory limits - it triggers reclaim and results > > in OOM kills by itself. > > That is not how my understanding of memory.high works. When memory usage > goes past memory.high, memory reclaim will be initiated to reclaim the > memory back. Stall happens when memory.usage keep increasing like by > consuming memory faster than what memory reclaim can recover. When > memory.max is reached, OOM killer will then kill off the tasks. This was the initial plan indeed: Slow down the workload and thus slow the growth; hope that the workload recovers with voluntary frees; set memory.max as a safety if it keeps going beyond. This never panned out. Once workloads are stuck, they might not back down on their own. By increasingly slowing growth, it becomes harder and harder for them to reach the memory.max intervention point. It's a very brittle configuration strategy. Unless you very carefully calibrate memory.high and memory.max together with awareness of the throttling algorithm, workloads that hit memory.high will just go to sleep indefinitely. They require outside intervention that either adjusts limits or implements kill policies based on observed sleeps (they're reported as pressure via psi). So the common usecases today end up being that memory.max is for enforcing kernel OOM kills, and memory.high is a tool to implement userspace OOM killing policies. Dan is right to point out the additional expectations for userspace management when memory.high is in used. And memory.max is still the primary, works-out-of-the-box method of memory containment.