Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Tue, 27 Feb 2024 08:06:52 -0800

On Tue, Feb 27, 2024 at 10:52:51AM -0500, Kent Overstreet wrote:
> On Tue, Feb 27, 2024 at 07:32:32AM -0800, Paul E. McKenney wrote:
> > I could simply use the same general approach that I use within RCU
> > itself, which currently has absolutely no idea how much memory (if any)
> > that each callback will free.  Especially given that some callbacks
> > free groups of memory blocks, while other free nothing.  ;-)
> > 
> > Alternatively, we could gather statistics on the amount of memory freed
> > by each callback and use that as an estimate.
> > 
> > But we should instead step back and ask exactly what we are trying to
> > accomplish here, which just might be what Dave Chinner was getting at.
> > 
> > At a ridiculously high level, reclaim is looking for memory to free.
> > Some read-only memory can often be dropped immediately on the grounds
> > that its data can be read back in if needed.  Other memory can only be
> > dropped after being written out, which involves a delay.  There are of
> > course many other complications, but this will do for a start.
> > 
> > So, where does RCU fit in?
> > 
> > RCU fits in between the two.  With memory awaiting RCU, there is no need
> > to write anything out, but there is a delay.  As such, memory waiting
> > for an RCU grace period is similar to memory that is to be reclaimed
> > after its I/O completes.
> > 
> > One complication, and a complication that we are considering exploiting,
> > is that, unlike reclaimable memory waiting for I/O, we could often
> > (but not always) have some control over how quickly RCU's grace periods
> > complete.  And we already do this programmatically by using the choice
> > between sychronize_rcu() and synchronize_rcu_expedited().  The question
> > is whether we should expedite normal RCU grace periods during reclaim,
> > and if so, under what conditions.
> > 
> > You identified one potential condition, namely the amount of memory
> > waiting to be reclaimed.  One complication with this approach is that RCU
> > has no idea how much memory each callback represents, and for call_rcu(),
> > there is no way for it to find out.  For kfree_rcu(), there are ways,
> > but as you know, I am questioning whether those ways are reasonable from
> > a performance perspective.  But even if they are, we would be accepting
> > more error from the memory waiting via call_rcu() than we would be
> > accepting if we just counted blocks instead of bytes for kfree_rcu().
> 
> You're _way_ overcomplicating this.

Sorry, but no.

Please read the remainder of my prior email carefully.

							Thanx, Paul

> The relevant thing to consider is the relative cost of __ksize() and
> kfree_rcu(). __ksize() is already pretty cheap, and with slab gone and
> space available in struct slab we can get it down to a single load.
> 
> > Let me reiterate that:  The estimation error that you are objecting to
> > for kfree_rcu() is completely and utterly unavoidable for call_rcu().
> 
> hardly, callsites manually freeing memory manually after an RCU grace
> period can do the accounting manually - if they're hot enough to matter,
> most aren.t
> 
> and with memory allocation profiling coming, which also tracks # of
> allocations, we'll also have an easy way to spot those.