Re: [RFC PATCH] vfs: add rcu-based find_inode variants for iget ops

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 12 Jun 2024 08:00:42 +1000

On Tue, Jun 11, 2024 at 01:28:46PM +0200, Jan Kara wrote:
> On Tue 11-06-24 20:17:16, Dave Chinner wrote:
> > Your patch, however, just converts *some* of the lookup API
> > operations to use RCU. It adds complexity for things like inserts
> > which are going to need inode hash locking if the RCU lookup fails,
> > anyway.
> > 
> > Hence your patch optimises the case where the inode is in cache but
> > the dentry isn't, but we'll still get massive contention on lookup
> > when the RCU lookup on the inode cache and inserts are always going
> > to be required.
> > 
> > IOWs, even RCU lookups are not going to prevent inode hash lock
> > contention for parallel cold cache lookups. Hence, with RCU,
> > applications are going to see unpredictable contention behaviour
> > dependent on the memory footprint of the caches at the time of the
> > lookup. Users will have no way of predicting when the behaviour will
> > change, let alone have any way of mitigating it. Unpredictable
> > variable behaviour is the thing we want to avoid the most with core
> > OS caches.
> 
> I don't believe this is what Mateusz's patches do (but maybe I've terribly
> misread them). iget_locked() does:
> 
> 	spin_lock(&inode_hash_lock);
> 	inode = find_inode_fast(...);
> 	spin_unlock(&inode_hash_lock);
> 	if (inode)
> 		we are happy and return
> 	inode = alloc_inode(sb);
> 	spin_lock(&inode_hash_lock);
> 	old = find_inode_fast(...)
> 	the rest of insert code
> 	spin_unlock(&inode_hash_lock);
> 
> And Mateusz got rid of the first lock-unlock pair by teaching
> find_inode_fast() to *also* operate under RCU. The second lookup &
> insertion stays under inode_hash_lock as it is now.

Yes, I understand that. I also understand what that does to
performance characteristics when memory pressure causes the working
set cache footprint to change. This will result in currently 
workloads that hit the fast path falling off the fast path and
hitting contention and performing no better than they do today.

Remember, the inode has lock is taken when inode are evicted from
memory, too, so contention on the inode hash lock will be much worse
when we are cycling inodes through the cache compared to when we are
just doing hot cache lookups.

> So his optimization is
> orthogonal to your hash bit lock improvements AFAICT.

Not really. RCU for lookups is not necessary when hash-bl is used.
The new apis and conditional locking changes needed for RCU to work
are not needed with hash-bl. hash-bl scales and performs the same
regardless of whether the workload is cache hot or cache-cold.

And the work is almost all done already - it just needs someone with
time to polish it for merge.

> Sure his optimization
> just ~halves the lock hold time for uncached cases (for cached it
> completely eliminates the lock acquisition but I agree these are not that
> interesting) so it is not a fundamental scalability improvement but still
> it is a nice win for a contended lock AFAICT.

Yes, but my point is that it doesn't get rid of the scalability
problem - it just kicks it down the road for small machines and
people with big machines will continue to suffer from the global
lock contention cycling inodes through the inode cache causes...

That's kinda my point - adding RCU doesn't fix the scalability
problem, it simply hides a specific symptom of the problem *in some
environments* for *some worklaods*. hash-bl works for everyone,
everywhere and for all workloads, doesn't require new APIs or
conditional locking changes, and th work is mostly done. Why take a
poor solution when the same amount of verification effort would
finish off a complete solution?

The hash-bl patchset is out there - I don't have time to finish it,
so anyone who has time to work on inode hash lock scalability issues
is free to take it and work on it. I may have written it, but I
don't care who gets credit for getting it merged. Again: why take
a poor solution just because the person who wants the scalability
problem fixed won't pick up the almost finished work that has all
ready been done?

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx