On Thu, Apr 11, 2019 at 05:47:46AM +0100, Al Viro wrote: > Note, BTW, that umount coming between isolate and drop is not a problem; > it call shrink_dcache_parent() on the root. And if shrink_dcache_parent() > finds something on (another) shrink list, it won't put it to the shrink > list of its own, but it will make note of that and repeat the scan in > such case. So if we find something with zero refcount and not on > shrink list, we can move it to our shrink list and be sure that its > superblock won't go away under us... Aaaarrgghhh... No, we can't. Look: we get one candidate dentry in isolate phase. We put it into shrink list. umount(2) comes and calls shrink_dcache_for_umount(), which calls shrink_dcache_parent(root). In the meanwhile, shrink_dentry_list() is run and does __dentry_kill() on that one dentry. Fine, it's gone - before shrink_dcache_parent() even sees it. Now shrink_dentry_list() holds a reference to its parent and is about to drop it in dentry = parent; while (dentry && !lockref_put_or_lock(&dentry->d_lockref)) dentry = dentry_kill(dentry); And dropped it will be, but... shrink_dcache_parent() has finished the scan, without finding *anything* with zero refcount - the thing that used to be on the shrink list was already gone before shrink_dcache_parent() has gotten there and the reference to parent was not dropped yet. So shrink_dcache_for_umount() plows past shrink_dcache_parent(), walks the tree and complains loudly about "busy" dentries (that parent we hadn't finished dropping), and then we proceed with filesystem shutdown. In the meanwhile, dentry_kill() finally gets to killing dentry and triggers an unexpected late call of ->d_iput() on a filesystem that has already been far enough into shutdown - far enough to destroy the data structures needed for that sucker. The reason we don't hit that problem with regular memory shrinker is this: unregister_shrinker(&s->s_shrink); fs->kill_sb(s); in deactivate_locked_super(). IOW, shrinker for this fs is gone before we get around to shutdown. And so are all normal sources of dentry eviction for that fs. Your earlier variants all suffer the same problem - picking a page shared by dentries from several superblocks can run into trouble if it overlaps with umount of one of those. Fuck... One variant of solution would be to have per-superblock struct kmem_cache to be used for dentries of that superblock. However, * we'd need to prevent them getting merged * it would add per-superblock memory costs (for struct kmem_cache and associated structures) * it might mean more pages eaten by the dentries - on average half a page per superblock (more if there are very few dentries on that superblock) OTOH, it might actually improve the memory footprint - all dentries sharing a page would be from the same superblock, so the use patterns might be more similar, which might lower the fragmentation... Hell knows... I'd like to hear an opinion from VM folks on that one. Comments?