On Thu, Nov 09, 2023 at 06:20:51AM +0000, Al Viro wrote: > That is to say, do *not* treat the ->d_inode or ->d_parent changes > as "it's hard, return false; somebody must have grabbed it, so > even if has zero refcount, we don't need to bother killing it - > final dput() from whoever grabbed it would've done everything". > > First of all, that is not guaranteed. It might have been dropped > by shrink_kill() handling of victim's parent, which would've found > it already on a shrink list (ours) and decided that they don't need > to put it on their shrink list. > > What's more, dentry_kill() is doing pretty much the same thing, > cutting its own set of corners (it assumes that dentry can't > go from positive to negative, so its inode can change but only once > and only in one direction). > > Doing that right allows to get rid of that not-quite-duplication > and removes the only reason for re-incrementing refcount before > the call of dentry_kill(). > > Replacement is called lock_for_kill(); called under rcu_read_lock > and with ->d_lock held. If it returns false, dentry has non-zero > refcount and the same locks are held. If it returns true, > dentry has zero refcount and all locks required by __dentry_kill() > are taken. > > Part of __lock_parent() had been lifted into lock_parent() to > allow its reuse. Now it's called with rcu_read_lock already > held and dentry already unlocked. > > Note that this is not the final change - locking requirements for > __dentry_kill() are going to change later in the series and the > set of locks taken by lock_for_kill() will be adjusted. > > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > --- It's a bit unfortunate that __lock_parent() locks the parent *and* may lock the child which isn't really obvious from the name. It just becomes clear that this is assumed by how callers release the child's lock. > fs/dcache.c | 159 ++++++++++++++++++++++------------------------------ > 1 file changed, 66 insertions(+), 93 deletions(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index 23afcd48c1a9..801502871671 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -625,8 +625,6 @@ static void __dentry_kill(struct dentry *dentry) > static struct dentry *__lock_parent(struct dentry *dentry) > { > struct dentry *parent; > - rcu_read_lock(); > - spin_unlock(&dentry->d_lock); > again: > parent = READ_ONCE(dentry->d_parent); > spin_lock(&parent->d_lock); > @@ -642,7 +640,6 @@ static struct dentry *__lock_parent(struct dentry *dentry) > spin_unlock(&parent->d_lock); > goto again; > } > - rcu_read_unlock(); > if (parent != dentry) > spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); > else > @@ -657,7 +654,64 @@ static inline struct dentry *lock_parent(struct dentry *dentry) > return NULL; > if (likely(spin_trylock(&parent->d_lock))) > return parent; > - return __lock_parent(dentry); > + rcu_read_lock(); > + spin_unlock(&dentry->d_lock); > + parent = __lock_parent(dentry); > + rcu_read_unlock(); > + return parent; > +} > + > +/* > + * Lock a dentry for feeding it to __dentry_kill(). > + * Called under rcu_read_lock() and dentry->d_lock; the former > + * guarantees that nothing we access will be freed under us. > + * Note that dentry is *not* protected from concurrent dentry_kill(), > + * d_delete(), etc. > + * > + * Return false if dentry is busy. Otherwise, return true and have > + * that dentry's inode and parent both locked. > + */ > + > +static bool lock_for_kill(struct dentry *dentry) > +{ > + struct inode *inode = dentry->d_inode; > + struct dentry *parent = dentry->d_parent; > + > + if (unlikely(dentry->d_lockref.count)) > + return false; > + > + if (inode && unlikely(!spin_trylock(&inode->i_lock))) > + goto slow; > + if (dentry == parent) > + return true; > + if (likely(spin_trylock(&parent->d_lock))) > + return true; > + > + if (inode) > + spin_unlock(&inode->i_lock); > +slow: > + spin_unlock(&dentry->d_lock); > + > + for (;;) { > + if (inode) > + spin_lock(&inode->i_lock); > + parent = __lock_parent(dentry); We're under rcu here. Are we sure that this can't trigger rcu timeouts because we're spinning? Maybe there's a reason that's not an issue here. That spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED) in __lock_parent() is there for the sake of lockdep to verify that the parent lock is always aqcuired before the child lock?