Re: [RFC] simplifying fast_dput(), dentry_kill() et.al.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 31, 2023 at 12:18:48AM +0000, Al Viro wrote:
> On Mon, Oct 30, 2023 at 12:18:28PM -1000, Linus Torvalds wrote:
> > On Mon, 30 Oct 2023 at 11:53, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > After fixing a couple of brainos, it seems to work.
> > 
> > This all makes me unnaturally nervous, probably because it;s overly
> > subtle, and I have lost the context for some of the rules.
> 
> A bit of context: I started to look at the possibility of refcount overflows.
> Writing the current rules for dentry refcounting and lifetime down was the
> obvious first step, and that immediately turned into an awful mess.
> 
> It is overly subtle.

	Another piece of too subtle shite: ordering of ->d_iput() of child
and __dentry_kill() of parent.  As it is, in some cases it is possible for
the latter to happen before the former.  It is *not* possible in the cases
when in-tree ->d_iput() instances actually look at the parent (all of those
are due to sillyrename stuff), but the proof is convoluted and very brittle.

	The origin of that mess is in the interaction of shrink_dcache_for_umount()
with shrink_dentry_list().  What we want to avoid is a directory looking like
it's busy since shrink_dcache_for_umount() doesn't see any children to account
for positive refcount of parent.  The kinda-sorta solution we use is to decrement
the parent's refcount *before* __dentry_kill() of child and put said parent
into a shrink list.  That makes shrink_dcache_for_umount() do the right thing,
but it's possible to end up with parent freed before the child is done with;
scenario is non-obvious, and rather hard to hit, but it's not impossible.

	dput() does no such thing - it does not decrement the parent's
refcount until the child had been taken care of.  That's fine, as far
as shrink_dcache_for_umount() is concerned - this is not a false positive;
with slightly different timing shrink_dcache_for_umount() would've reported
the child as being busy.  IOW, there should be no overlap between dput()
in one thread and shrink_dcache_for_umount() in another.  Unfortunately,
memory eviction *can* come in the middle of shrink_dcache_for_umount().

	Life would be much simpler if shrink_dentry_list() would not have
to pull that kind of tricks and used the same ordering as dput() does.
IMO there's a reasonably cheap way to achieve that:

	* have shrink_dcache_for_umount() mark the superblock (either in
->s_flags or inside the ->s_dentry_lru itself) and have the logics
in retain_dentry() that does insertion into LRU list check ->d_sb for that
mark, treating its presence as "do not retain".
	* after marking the superblock shrink_dentry_for_umount() is guaranteed
that nothing new will be added to shrink list in question.  Have it call
shrink_dcache_sb() to drain LRU.
	* Now shrink_dentry_list() in one thread hitting a dentry on
a superblock going throug shrink_dcache_for_umount() in another thread is
always a bug and reporting busy dentries is the right thing to do.
So we can switch shrink_dentry_list() to the same "drop reference to parent
only after the child had been killed" ordering as we have in dput().

	IMO that removes a fairly nasty trap for ->d_iput() and ->d_delete()
instances.  As for the overhead, the relevant fragment of retain_dentry() is
	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
		d_lru_add(dentry);
	else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
		dentry->d_flags |= DCACHE_REFERENCED;
	return true;
That would become
	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST))) {
		if (unlikely(dentry->d_sb is marked))
			return false;
		d_lru_add(dentry);
	} else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
		dentry->d_flags |= DCACHE_REFERENCED;
	return true;
Note that d_lru_add() will hit ->d_sb->s_dentry_lru, so we are not
adding memory traffic here; the else if part doesn't need to be
touched - we only need to prevent insertions into LRU.

	Comments?




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux