On Wed, Feb 16, 2022 at 03:27:39AM +0000, Al Viro wrote: > On Tue, Feb 15, 2022 at 06:24:53PM -0800, Stephen Brennan wrote: > > > It seems to me that, if we had taken a reference on child by > > incrementing the reference count prior to unlocking it, then > > dentry_unlist could never have been called, since we would never have > > made it into __dentry_kill. child would still be on the list, and any > > cursor (or sweep_negative) list updates would now be reflected in > > child->d_child.next. But dput is definitely not safe while holding a > > lock on a parent dentry (even more so now thanks to my patch), so that > > is out of the question. > > > > Would dput_to_list be an appropriate solution to that issue? We can > > maintain a dispose list in d_walk and then for any dput which really > > drops the refcount to 0, we can handle them after d_walk is done. It > > shouldn't be that many dentries anyway. > > Interesting idea, but... what happens to behaviour of e.g. > shrink_dcache_parent()? You'd obviously need to modify the test in > select_collect(), but then the selected dentries become likely candidates > for d_walk() itself wanting to move them over to its internal shrink list. > OTOH, __dput_to_list() will just decrement the count and skip the sucker > if it's already on a shrink list... > > It might work, but it really needs a careful analysis wrt. > parallel d_walk(). What happens when you have two threads hitting > shrink_dcache_parent() on two different places, one being an ancestor > of another? That can happen in parallel, and currently it does work > correctly, but that's fairly delicate and there are places where a minor > change could turn O(n) into O(n^2), etc. > > Let me think about that - I'm not saying it's hopeless, and it > would be nice to avoid that subtlety in dentry_unlist(), but there > might be dragons. PS: another obvious change is that d_walk() would become blocking. So e.g. int path_has_submounts(const struct path *parent) { struct check_mount data = { .mnt = parent->mnt, .mounted = 0 }; read_seqlock_excl(&mount_lock); d_walk(parent->dentry, &data, path_check_mount); read_sequnlock_excl(&mount_lock); return data.mounted; } would need a rework - d_walk() is under a spinlock here. Another potential headache in that respect is d_genocide() - currently non-blocking, with this change extremely likely to do evictions. That, however, is not a problem for current in-tree callers - they are all shortly followed by shrink_dcache_parent() or equivalents. path_has_submounts(), though... I'd really hate to reintroduce the "call this on entry/call this on exit" callbacks. Perhaps it would be better to pass the dispose list to d_walk() and have the callers deal with evictions? For that matter, shrink_dcache_parent() and friends would be just fine passing the same list they are collecting into. <looks at path_has_submounts() callers> *growl* autofs_d_automount() has it called under sbi->fs_lock. So we'd need to take the disposal all the way out there, and export shrink_dentry_list() while we are at it. Not pretty ;-/ And no, we can't make the disposal async, so offloading it to a worker or thread is not feasible...