Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 27 Feb 2024 11:43:29 +1100

On Mon, Feb 26, 2024 at 06:29:43PM -0500, Kent Overstreet wrote:
> On Mon, Feb 26, 2024 at 01:55:10PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 26, 2024 at 04:19:14PM -0500, Kent Overstreet wrote:
> > > > RCU allocating and freeing of memory can already be fairly significant
> > > > depending on workload, and I'd expect that to grow - we really just need
> > > > a way for reclaim to kick RCU when needed (and probably add a percpu
> > > > counter for "amount of memory stranded until the next RCU grace
> > > > period").
> > 
> > There are some APIs for that, though the are sharp-edged and mainly
> > intended for rcutorture, and there are some hooks for a CI Kconfig
> > option called RCU_STRICT_GRACE_PERIOD that could be organized into
> > something useful.
> > 
> > Of course, if there is a long-running RCU reader, there is nothing
> > RCU can do.  By definition, it must wait on all pre-existing readers,
> > no exceptions.
> > 
> > But my guess is that you instead are thinking of memory-exhaustion
> > emergencies where you would like RCU to burn more CPU than usual to
> > reduce grace-period latency, there are definitely things that can be done.
> > 
> > I am sure that there are more questions that I should ask, but the one
> > that comes immediately to mind is "Is this API call an occasional thing,
> > or does RCU need to tolerate many CPUs hammering it frequently?"
> > Either answer is fine, I just need to know.  ;-)
> 
> Well, we won't want it getting hammered on continuously - we should be
> able to tune reclaim so that doesn't happen.

If we are under sustained memory pressure, there will be a
relatively steady state of "stranded memory" - every rcu grace
period will be stranding and freeing roughly the same amount of
memory because that reclaim progress across all caches won't change
significantly from grace period to grace period.

I really haven't seen "stranded memory" from reclaimable slab caches
(like inodes and dentries) ever causing issues with allocation or
causing OOM kills.  Hence I'm not sure that there is any real need
for expediting the freeing of RCU memory in the general case - it's
probably only when we get near OOM (i.e. reclaim priority is
approaching 0) that expediting rcu_free()d memory may make any
difference to allocation success...

> I think getting numbers on the amount of memory stranded waiting for RCU
> is probably first order of business - minor tweak to kfree_rcu() et all
> for that; there's APIs they can query to maintain that counter.

Yes, please. Get some numbers that show there is an actual problem
here that needs solving.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx