Re: dcache locking question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 15, 2019 at 10:19:39AM +1100, Tobin C. Harding wrote:
> On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> > Hi,
> > 
> > I'm not able to understand the locking order in dcache.  dcache has been
> > around for a while so clearly its right and I'm wrong.
> > 
> > Could someone please explain to me how the locking order commented at
> > the top of the file is not violated in the following:
> 
> > 
> > From top of fs/dcache.c
> > 
> >  * If there is an ancestor relationship:
> >  * dentry->d_parent->...->d_parent->d_lock
> >  *   ...
> >  *     dentry->d_parent->d_lock
> >  *       dentry->d_lock
> > 
> 
> This is a more simple example
> 
> static struct dentry *dentry_kill(struct dentry *dentry)
> 	__releases(dentry->d_lock)
> {
>         ...
> 
> slow_positive:
> 	spin_unlock(&dentry->d_lock);
> 	spin_lock(&inode->i_lock);
> 	spin_lock(&dentry->d_lock);
> 	parent = lock_parent(dentry);

Yes?  Why is that a problem?  We start with ->d_lock held.
We drop it.  At that point we are holding a reference to
dentry and no locks whatsoever.  We know that inode is not
going away (we are holding a reference to positive dentry,
so its ->d_inode can't change under us, and it contributes
to inode refcount).  So we can safely grab its ->i_lock.
Then we grab (in normal order) ->d_lock.  Then, in
lock_parent() we check if ->d_parent points to ourselves.
->d_parent is stabilized by ->d_lock, so that check
if fine.  If it does *not* point to ourselves, we do
trylock on its ->d_parent. Also fine - ->d_parent isn't
going anywhere and trylock can't lead to deadlocks;
that's what "try" in the name is about.  If it succeeds,
we have
	* inode->i_lock held
	* dentry->d_lock held
	* parent->d_lock held
	* inode == dentry->d_inode
	* parent == dentry->d_parent
and we are fine, except that the reference we are holding
might not be the only existing one anymore (it used to be,
but we'd dropped ->d_lock, so additional ones might've
appeared, and we need to repeat the retain_dentry()-related
checks).

If it fails, we call __lock_parent().  Which
	* grabs RCU lock
	* drops ->d_lock (now we are not holding ->d_lock
on anything).
	* fetches ->d_parent.  Note the READ_ONCE() there -
it's *NOT* stable (no ->d_lock held).  We can't expect
that ->d_parent won't change or that the reference it used
to contribute to parent's refcount is there anymore; as
the matter of fact, the only thing that prevents outright
_freeing_ of the object 'parent' points to is rcu_read_lock()
and RCU delay between dropping the last reference and
actual freeing of the sucker.  rcu_read_lock() is there,
though, which makes it safe to grab ->d_lock on 'parent'.

That 'parent' might very well have nothing to do with our
dentry by now.  We can check if it's equal to its
->d_parent, though.  dentry->d_parent is *NOT* stable
at that point.  It might be changing right now.

However, the first store to dentry->d_parent making it
not equal to parent would have been done under parent->d_lock.
And since we are holding parent->d_lock, we won't miss that
store.  We might miss subsequent ones, but if we observe
dentry->d_parent == parent, we know that it's stable.  And
if we see dentry->d_parent != parent, we know that dentry
has moved around and we need to retry anyway.

If we do *not* need to retry, we have
	* inode->i_lock held
	* parent->d_lock held
	* dentry->d_parent == parent
	* dentry->d_inode == inode
At that point we can drop rcu_read_lock() - parent can't be
freed under us without dentry->d_parent switched from
parent to something else, and anyone trying to do that will
spin on parent->d_lock.

And now we can grab dentry->d_lock - ordering is guaranteed.
The only minor twist is that once upon a time __lock_parent()
used to have to cope with the possibility of dentry becoming
detached while we were retrying; in that case we obviously
don't want to grab dentry->d_lock - we are holding it already
(as parent->d_lock).

These days (since "make non-exchanging __d_move() copy ->d_parent
rather than swap them") if IS_ROOT(dentry) is false at some
point, it *never* will become true, so this check could've
been dropped, along with the checks for __lock_parent() return
value possibly being NULL.

I'm not saying it's not subtle and nasty - any time you see
READ_ONCE() (or rcu_read_lock(), for that matter), expect
headache.  But trylock is absolutely innocious part here; we
are not even retrying it once.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux