On Fri, Mar 01, 2024 at 11:08:52AM +0700, James Bottomley wrote: > On Thu, 2024-02-29 at 22:52 -0500, Kent Overstreet wrote: > > On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote: > > > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote: > > > > Or maybe you just want the syscall to return an error instead of > > > > blocking for an unbounded amount of time if userspace asks for > > > > something silly. > > > > > > Warn on allocation above a certain size without MAY_FAIL would seem > > > to cover all those cases. If there is a case for requiring instant > > > allocation, you always have GFP_ATOMIC, and, I suppose, we could > > > even do a bounded reclaim allocation where it tries for a certain > > > time then fails. > > > > Then you're baking in this weird constant into all your algorithms > > that doesn't scale as machine memory sizes and working set sizes > > increase. > > > > > > Honestly, relying on the OOM killer and saying that because that > > > > now we don't have to write and test your error paths is a lazy > > > > cop out. > > > > > > OOM Killer is the most extreme outcome. Usually reclaim (hugely > > > simplified) dumps clean cache first and tries the shrinkers then > > > tries to write out dirty cache. Only after that hasn't found > > > anything after a few iterations will the oom killer get activated > > > > All your caches dumped and the machine grinds to a halt and then a > > random process gets killed instead of simply _failing the > > allocation_. > > Ignoring the fact free invective below, I think what you're asking for > is strict overcommit. There's a tunable for that: > > https://www.kernel.org/doc/Documentation/vm/overcommit-accounting > > However, see the Gotchas section for why we can't turn it on globally, > but it is available to you if you know what you're doing. James, I already explained all this.