On Tue, Feb 27, 2024 at 10:52:51AM -0500, Kent Overstreet wrote: > On Tue, Feb 27, 2024 at 07:32:32AM -0800, Paul E. McKenney wrote: > > I could simply use the same general approach that I use within RCU > > itself, which currently has absolutely no idea how much memory (if any) > > that each callback will free. Especially given that some callbacks > > free groups of memory blocks, while other free nothing. ;-) > > > > Alternatively, we could gather statistics on the amount of memory freed > > by each callback and use that as an estimate. > > > > But we should instead step back and ask exactly what we are trying to > > accomplish here, which just might be what Dave Chinner was getting at. > > > > At a ridiculously high level, reclaim is looking for memory to free. > > Some read-only memory can often be dropped immediately on the grounds > > that its data can be read back in if needed. Other memory can only be > > dropped after being written out, which involves a delay. There are of > > course many other complications, but this will do for a start. > > > > So, where does RCU fit in? > > > > RCU fits in between the two. With memory awaiting RCU, there is no need > > to write anything out, but there is a delay. As such, memory waiting > > for an RCU grace period is similar to memory that is to be reclaimed > > after its I/O completes. > > > > One complication, and a complication that we are considering exploiting, > > is that, unlike reclaimable memory waiting for I/O, we could often > > (but not always) have some control over how quickly RCU's grace periods > > complete. And we already do this programmatically by using the choice > > between sychronize_rcu() and synchronize_rcu_expedited(). The question > > is whether we should expedite normal RCU grace periods during reclaim, > > and if so, under what conditions. > > > > You identified one potential condition, namely the amount of memory > > waiting to be reclaimed. One complication with this approach is that RCU > > has no idea how much memory each callback represents, and for call_rcu(), > > there is no way for it to find out. For kfree_rcu(), there are ways, > > but as you know, I am questioning whether those ways are reasonable from > > a performance perspective. But even if they are, we would be accepting > > more error from the memory waiting via call_rcu() than we would be > > accepting if we just counted blocks instead of bytes for kfree_rcu(). > > You're _way_ overcomplicating this. Sorry, but no. Please read the remainder of my prior email carefully. Thanx, Paul > The relevant thing to consider is the relative cost of __ksize() and > kfree_rcu(). __ksize() is already pretty cheap, and with slab gone and > space available in struct slab we can get it down to a single load. > > > Let me reiterate that: The estimation error that you are objecting to > > for kfree_rcu() is completely and utterly unavoidable for call_rcu(). > > hardly, callsites manually freeing memory manually after an RCU grace > period can do the accounting manually - if they're hot enough to matter, > most aren.t > > and with memory allocation profiling coming, which also tracks # of > allocations, we'll also have an easy way to spot those.