Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 01 Mar 2024 11:08:52 +0700

On Thu, 2024-02-29 at 22:52 -0500, Kent Overstreet wrote:
> On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote:
> > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote:
> > > Or maybe you just want the syscall to return an error instead of
> > > blocking for an unbounded amount of time if userspace asks for
> > > something silly.
> > 
> > Warn on allocation above a certain size without MAY_FAIL would seem
> > to cover all those cases.  If there is a case for requiring instant
> > allocation, you always have GFP_ATOMIC, and, I suppose, we could
> > even do a bounded reclaim allocation where it tries for a certain
> > time then fails.
> 
> Then you're baking in this weird constant into all your algorithms
> that doesn't scale as machine memory sizes and working set sizes
> increase.
> 
> > > Honestly, relying on the OOM killer and saying that because that
> > > now we don't have to write and test your error paths is a lazy
> > > cop out.
> > 
> > OOM Killer is the most extreme outcome.  Usually reclaim (hugely
> > simplified) dumps clean cache first and tries the shrinkers then
> > tries to write out dirty cache.  Only after that hasn't found
> > anything after a few iterations will the oom killer get activated
> 
> All your caches dumped and the machine grinds to a halt and then a
> random process gets killed instead of simply _failing the
> allocation_.

Ignoring the fact free invective below, I think what you're asking for
is strict overcommit.  There's a tunable for that:

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

However, see the Gotchas section for why we can't turn it on globally,
but it is available to you if you know what you're doing.

James