Re: [RFC] simplifying fast_dput(), dentry_kill() et.al.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 06, 2023 at 05:53:53AM +0000, Al Viro wrote:

> AFAICS, there are 3 options:
> 	1) leave the current weirdness with ->d_iput() on child vs __dentry_kill()
> on parent.  Document the requirement to ->d_iput() (and ->d_release()) to cope
> with that, promise that in case of sillyrename the ordering will be there and
> write down the proof of that.  No code changes, rather revolting docs to
> write, trouble waiting to happen in ->d_iput().
> 	2) require that shrink_dentry_list() should never overlap with
> shrink_dcache_for_umount() on any of the filesystems represented in the
> shrink list, guarantee that parent won't get to __dentry_kill() before
> the child gets through __dentry_kill() completely and accept that resurrecting
> SMO stuff will require more work.  Smallish patch, tolerable docs, probably
> the best option at the moment.
> 	3) bite the bullet and get shrink_dentry_list() to coexist with
> shrink_dcache_for_umount(), with sane ordering of ->d_iput() vs. parent's
> __dentry_kill().  Doable, but AFAICS it will take a counter of children
> currently being killed in the parent dentry.  shrink_dentry_list() would
> bump that on parent, __dentry_kill() the victim, then relock the parent
> and decrement that counter along with the main refcount.  That would allow
> the shrink_dcache_for_umount() to cope with that crap.  No requirements
> for shrink_dentry_kill() callers that way, sane environment for ->d_iput(),
> no obstacles for SMO stuff.  OTOH, we need to get space for additional
> counter in struct dentry; again, doable (->d_subdirs/->d_child can be
> converted to hlist, saving us a pointer in each dentry), but... I'd
> leave that option alone until something that needs it would show up
> (e.g. if/when Tobin resurrects his patchset).

	4) instead of having __dentry_kill() called with dentry, parent
and inode locked and doing
	->d_prune
	unhash
	remove from list of children
	unlock parent
	detach from inode
	unlock dentry and inode
	drop inode
	->d_release
	relock dentry
	if on shrink list, mark as ready to free 
	unlock dentry
	if was not on shrink list, free it
go for calling it with just dentry and inode locked and do
	->d_prune
	unhash
	detach from inode
	unlock dentry and inode
	drop inode
	->d_release
	lock parent (if any, as usual)
	lock dentry
	remove from list of children
	if on shrink list, mark as ready to free
	unlock dentry
	if was on shrink list, free it
	decrement parent's refcount (again, if there was a parent)
	if refcount is still positive - unlock parent and return NULL
	otherwise return parent

What changes:
	* caller needs milder locking environment; lock_for_kill() gets simpler.
	  Note that only positive dentries can be moved, so inside __dentry_kill()
	  we need no retry loops, etc. - ->d_parent is stable by the point we decide
	  to remove from the list of children.
	* code that iterates through the list of children (not much of it)
	  needs to cope with seeing negative unhashed dentries with
	  refcount marked dead.  Most of it will need no changes at all.
	* ->d_prune() instances are called without parent's ->d_lock; just
	  the victim's one.  Might require changes to out-of-tree filesystems.
	* dput() turns into
	if (!dentry)
		return;
	rcu_read_lock()
	if (fast_dput(dentry)) {
		rcu_read_unlock();
		return;
	}
	while (lock_for_kill(dentry)) { // not bothering with the parent
		rcu_read_unlock();
		dentry = __dentry_kill(dentry);
		if (!dentry)
			return;
		if (retain_dentry(dentry)) {
			spin_unlock(&dentry->d_lock);
			return;
		}
		rcu_read_lock();
	}
	spin_unlock(&dentry->d_lock);
	rcu_read_unlock();
since there's no point trying to avoid locking the parents - we need
to grab those locks at some point anyway, just to remove a child from
the list of children, and that way we return from __dentry_kill() with
that lock held.
	* shrink_dentry_list() eviction of parents happens thus:
	do {
		rcu_read_unlock();
		victim = __dentry_kill(victim);
		rcu_read_lock();
	while (victim && lock_for_kill(victim));
	rcu_read_unlock();
	if (victim)
		spin_unlock(&victim->d_lock);
	* sane order of ->d_iput() on child vs. __dentry_kill() on parent.
	* shrink_dcache_for_umount() does the right thing even if it
overlaps shrink_dentry_list().

	If that works, it's probably the best variant...




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux