On Thu, Nov 09, 2023 at 06:20:56AM +0000, Al Viro wrote: > Currently we enter __dentry_kill() with parent (along with the victim > dentry and victim's inode) held locked. Then we > mark dentry refcount as dead > call ->d_prune() > remove dentry from hash > remove it from the parent's list of children > unlock the parent, don't need it from that point on > detach dentry from inode, unlock dentry and drop the inode > (via ->d_iput()) > call ->d_release() > regain the lock on dentry > check if it's on a shrink list (in which case freeing its empty husk > has to be left to shrink_dentry_list()) or not (in which case we can free it > ourselves). In the former case, mark it as an empty husk, so that > shrink_dentry_list() would know it can free the sucker. > drop the lock on dentry > ... and usually the caller proceeds to drop a reference on the parent, > possibly retaking the lock on it. > > That is painful for a bunch of reasons, starting with the need to take locks > out of order, but not limited to that - the parent of positive dentry can > change if we drop its ->d_lock, so getting these locks has to be done with > care. Moreover, as soon as dentry is out of the parent's list of children, > shrink_dcache_for_umount() won't see it anymore, making it appear as if > the parent is inexplicably busy. We do work around that by having > shrink_dentry_list() decrement the parent's refcount first and put it on > shrink list to be evicted once we are done with __dentry_kill() of child, > but that may in some cases lead to ->d_iput() on child called after the > parent got killed. That doesn't happen in cases where in-tree ->d_iput() > instances might want to look at the parent, but that's brittle as hell. > > Solution: do removal from the parent's list of children in the very > end of __dentry_kill(). As the result, the callers do not need to > lock the parent and by the time we really need the parent locked, > dentry is negative and is guaranteed not to be moved around. > > It does mean that ->d_prune() will be called with parent not locked. > It also means that we might see dentries in process of being torn > down while going through the parent's list of children; those dentries > will be unhashed, negative and with refcount marked dead. In practice, > that's enough for in-tree code that looks through the list of children > to do the right thing as-is. Out-of-tree code might need to be adjusted. > > Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock > held, along with ->i_lock of its inode (if any). It either returns > the parent (locked, with refcount decremented to 0) or NULL (if there'd > been no parent or if refcount decrement for parent hadn't reached 0). > > lock_for_kill() is adjusted for new requirements - it doesn't touch > the parent's ->d_lock at all. > > Callers adjusted. Note that for dput() we don't need to bother with > fast_dput() for the parent - we just need to check retain_dentry() > for it, since its ->d_lock is still held since the moment when > __dentry_kill() had taken it to remove the victim from the list of > children. > > The kludge with early decrement of parent's refcount in > shrink_dentry_list() is no longer needed - shrink_dcache_for_umount() > sees the half-killed dentries in the list of children for as long > as they are pinning the parent. They are easily recognized and > accounted for by select_collect(), so we know we are not done yet. > > As the result, we always have the expected ordering for ->d_iput()/->d_release() > vs. __dentry_kill() of the parent, no exceptions. Moreover, the current > rules for shrink lists (one must make sure that shrink_dcache_for_umount() > won't happen while any dentries from the superblock in question are on > any shrink lists) are gone - shrink_dcache_for_umount() will do the > right thing in all cases, taking such dentries out. Their empty > husks (memory occupied by struct dentry itself + its external name, > if any) will remain on the shrink lists, but they are no obstacles > to filesystem shutdown. And such husks will get freed as soon as > shrink_dentry_list() of the list they are on gets to them. > > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > --- Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx>