On Mon, Nov 06, 2023 at 05:53:53AM +0000, Al Viro wrote: > AFAICS, there are 3 options: > 1) leave the current weirdness with ->d_iput() on child vs __dentry_kill() > on parent. Document the requirement to ->d_iput() (and ->d_release()) to cope > with that, promise that in case of sillyrename the ordering will be there and > write down the proof of that. No code changes, rather revolting docs to > write, trouble waiting to happen in ->d_iput(). > 2) require that shrink_dentry_list() should never overlap with > shrink_dcache_for_umount() on any of the filesystems represented in the > shrink list, guarantee that parent won't get to __dentry_kill() before > the child gets through __dentry_kill() completely and accept that resurrecting > SMO stuff will require more work. Smallish patch, tolerable docs, probably > the best option at the moment. > 3) bite the bullet and get shrink_dentry_list() to coexist with > shrink_dcache_for_umount(), with sane ordering of ->d_iput() vs. parent's > __dentry_kill(). Doable, but AFAICS it will take a counter of children > currently being killed in the parent dentry. shrink_dentry_list() would > bump that on parent, __dentry_kill() the victim, then relock the parent > and decrement that counter along with the main refcount. That would allow > the shrink_dcache_for_umount() to cope with that crap. No requirements > for shrink_dentry_kill() callers that way, sane environment for ->d_iput(), > no obstacles for SMO stuff. OTOH, we need to get space for additional > counter in struct dentry; again, doable (->d_subdirs/->d_child can be > converted to hlist, saving us a pointer in each dentry), but... I'd > leave that option alone until something that needs it would show up > (e.g. if/when Tobin resurrects his patchset). 4) instead of having __dentry_kill() called with dentry, parent and inode locked and doing ->d_prune unhash remove from list of children unlock parent detach from inode unlock dentry and inode drop inode ->d_release relock dentry if on shrink list, mark as ready to free unlock dentry if was not on shrink list, free it go for calling it with just dentry and inode locked and do ->d_prune unhash detach from inode unlock dentry and inode drop inode ->d_release lock parent (if any, as usual) lock dentry remove from list of children if on shrink list, mark as ready to free unlock dentry if was on shrink list, free it decrement parent's refcount (again, if there was a parent) if refcount is still positive - unlock parent and return NULL otherwise return parent What changes: * caller needs milder locking environment; lock_for_kill() gets simpler. Note that only positive dentries can be moved, so inside __dentry_kill() we need no retry loops, etc. - ->d_parent is stable by the point we decide to remove from the list of children. * code that iterates through the list of children (not much of it) needs to cope with seeing negative unhashed dentries with refcount marked dead. Most of it will need no changes at all. * ->d_prune() instances are called without parent's ->d_lock; just the victim's one. Might require changes to out-of-tree filesystems. * dput() turns into if (!dentry) return; rcu_read_lock() if (fast_dput(dentry)) { rcu_read_unlock(); return; } while (lock_for_kill(dentry)) { // not bothering with the parent rcu_read_unlock(); dentry = __dentry_kill(dentry); if (!dentry) return; if (retain_dentry(dentry)) { spin_unlock(&dentry->d_lock); return; } rcu_read_lock(); } spin_unlock(&dentry->d_lock); rcu_read_unlock(); since there's no point trying to avoid locking the parents - we need to grab those locks at some point anyway, just to remove a child from the list of children, and that way we return from __dentry_kill() with that lock held. * shrink_dentry_list() eviction of parents happens thus: do { rcu_read_unlock(); victim = __dentry_kill(victim); rcu_read_lock(); while (victim && lock_for_kill(victim)); rcu_read_unlock(); if (victim) spin_unlock(&victim->d_lock); * sane order of ->d_iput() on child vs. __dentry_kill() on parent. * shrink_dcache_for_umount() does the right thing even if it overlaps shrink_dentry_list(). If that works, it's probably the best variant...