Re: [PATCH 22/22] __dentry_kill(): new locking scheme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 09, 2023 at 06:20:56AM +0000, Al Viro wrote:
> Currently we enter __dentry_kill() with parent (along with the victim
> dentry and victim's inode) held locked.  Then we
> 	mark dentry refcount as dead
> 	call ->d_prune()
> 	remove dentry from hash
> 	remove it from the parent's list of children
> 	unlock the parent, don't need it from that point on
> 	detach dentry from inode, unlock dentry and drop the inode
> (via ->d_iput())
> 	call ->d_release()
> 	regain the lock on dentry
> 	check if it's on a shrink list (in which case freeing its empty husk
> has to be left to shrink_dentry_list()) or not (in which case we can free it
> ourselves).  In the former case, mark it as an empty husk, so that
> shrink_dentry_list() would know it can free the sucker.
> 	drop the lock on dentry
> ... and usually the caller proceeds to drop a reference on the parent,
> possibly retaking the lock on it.
> 
> That is painful for a bunch of reasons, starting with the need to take locks
> out of order, but not limited to that - the parent of positive dentry can
> change if we drop its ->d_lock, so getting these locks has to be done with
> care.  Moreover, as soon as dentry is out of the parent's list of children,
> shrink_dcache_for_umount() won't see it anymore, making it appear as if
> the parent is inexplicably busy.  We do work around that by having
> shrink_dentry_list() decrement the parent's refcount first and put it on
> shrink list to be evicted once we are done with __dentry_kill() of child,
> but that may in some cases lead to ->d_iput() on child called after the
> parent got killed.  That doesn't happen in cases where in-tree ->d_iput()
> instances might want to look at the parent, but that's brittle as hell.
> 
> Solution: do removal from the parent's list of children in the very
> end of __dentry_kill().  As the result, the callers do not need to
> lock the parent and by the time we really need the parent locked,
> dentry is negative and is guaranteed not to be moved around.
> 
> It does mean that ->d_prune() will be called with parent not locked.
> It also means that we might see dentries in process of being torn
> down while going through the parent's list of children; those dentries
> will be unhashed, negative and with refcount marked dead.  In practice,
> that's enough for in-tree code that looks through the list of children
> to do the right thing as-is.  Out-of-tree code might need to be adjusted.
> 
> Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock
> held, along with ->i_lock of its inode (if any).  It either returns
> the parent (locked, with refcount decremented to 0) or NULL (if there'd
> been no parent or if refcount decrement for parent hadn't reached 0).
> 
> lock_for_kill() is adjusted for new requirements - it doesn't touch
> the parent's ->d_lock at all.
> 
> Callers adjusted.  Note that for dput() we don't need to bother with
> fast_dput() for the parent - we just need to check retain_dentry()
> for it, since its ->d_lock is still held since the moment when
> __dentry_kill() had taken it to remove the victim from the list of
> children.
> 
> The kludge with early decrement of parent's refcount in
> shrink_dentry_list() is no longer needed - shrink_dcache_for_umount()
> sees the half-killed dentries in the list of children for as long
> as they are pinning the parent.  They are easily recognized and
> accounted for by select_collect(), so we know we are not done yet.
> 
> As the result, we always have the expected ordering for ->d_iput()/->d_release()
> vs. __dentry_kill() of the parent, no exceptions.  Moreover, the current
> rules for shrink lists (one must make sure that shrink_dcache_for_umount()
> won't happen while any dentries from the superblock in question are on
> any shrink lists) are gone - shrink_dcache_for_umount() will do the
> right thing in all cases, taking such dentries out.  Their empty
> husks (memory occupied by struct dentry itself + its external name,
> if any) will remain on the shrink lists, but they are no obstacles
> to filesystem shutdown.  And such husks will get freed as soon as
> shrink_dentry_list() of the list they are on gets to them.
> 
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---

Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux