On Thu, 2024-02-29 at 22:52 -0500, Kent Overstreet wrote: > On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote: > > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote: > > > Or maybe you just want the syscall to return an error instead of > > > blocking for an unbounded amount of time if userspace asks for > > > something silly. > > > > Warn on allocation above a certain size without MAY_FAIL would seem > > to cover all those cases. If there is a case for requiring instant > > allocation, you always have GFP_ATOMIC, and, I suppose, we could > > even do a bounded reclaim allocation where it tries for a certain > > time then fails. > > Then you're baking in this weird constant into all your algorithms > that doesn't scale as machine memory sizes and working set sizes > increase. > > > > Honestly, relying on the OOM killer and saying that because that > > > now we don't have to write and test your error paths is a lazy > > > cop out. > > > > OOM Killer is the most extreme outcome. Usually reclaim (hugely > > simplified) dumps clean cache first and tries the shrinkers then > > tries to write out dirty cache. Only after that hasn't found > > anything after a few iterations will the oom killer get activated > > All your caches dumped and the machine grinds to a halt and then a > random process gets killed instead of simply _failing the > allocation_. Ignoring the fact free invective below, I think what you're asking for is strict overcommit. There's a tunable for that: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting However, see the Gotchas section for why we can't turn it on globally, but it is available to you if you know what you're doing. James