Re: dcache locking question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> > 
> > If it fails, we call __lock_parent().  Which
> > 	* grabs RCU lock
> > 	* drops ->d_lock (now we are not holding ->d_lock
> > on anything).
> > 	* fetches ->d_parent.  Note the READ_ONCE() there -
> > it's *NOT* stable (no ->d_lock held).  We can't expect
> > that ->d_parent won't change or that the reference it used
> > to contribute to parent's refcount is there anymore; as
> > the matter of fact, the only thing that prevents outright
> > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > and RCU delay between dropping the last reference and
> > actual freeing of the sucker.  rcu_read_lock() is there,
> > though, which makes it safe to grab ->d_lock on 'parent'.
> > 
> > That 'parent' might very well have nothing to do with our
> > dentry by now.  We can check if it's equal to its
> > ->d_parent, though.  dentry->d_parent is *NOT* stable
> > at that point.  It might be changing right now.
> > 
> > However, the first store to dentry->d_parent making it
> > not equal to parent would have been done under parent->d_lock.
> > And since we are holding parent->d_lock, we won't miss that
> > store.  We might miss subsequent ones, but if we observe
> > dentry->d_parent == parent, we know that it's stable.  And
> > if we see dentry->d_parent != parent, we know that dentry
> > has moved around and we need to retry anyway.
> Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
> 	if (unlikely(parent != dentry->d_parent)) {
> Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> concurrently changed to 0xBBBBBBBB.
> d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> appearing that d_parent == 0xAAAABBBB == parent.
> Yes it won't really be compiled as that in practice, but I thought the point of
> READ_ONCE() is to *guarantee* it's really done right...

READ_ONCE does not add any extra warranties of atomicity.  Fetches and stores
of pointers are atomic, period; if that ever breaks, we are in a very deep
trouble all over the place.

What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
operation.  So that second fetch of ->d_parent will happen after we grab
parent->d_lock, from everyone's POV.  Critical areas for the same spinlock
are ordered wrt each other.  So we have observed
	FETCH dentry->d_parent => parent
	LOCK parent->d_lock
	FETCH dentry->d_parent => parent
All stores to dentry->d_parent are done with ->d_lock held on dentry,
old value of dentry->d_parent *and* new value of dentry->d_parent.  So
the second fetch is ordered wrt all stores making dentry->d_parent change
to parent and all stores making it change *from* parent.  We might miss
some stores changing it from one value other than parent to another such,
but the predicate itself is fine and will stay fine until we drop

Paul, could you comment on that one?  The function in question is
static struct dentry *__lock_parent(struct dentry *dentry)
        struct dentry *parent;
        parent = READ_ONCE(dentry->d_parent);
         * We can't blindly lock dentry until we are sure
         * that we won't violate the locking order.
         * Any changes of dentry->d_parent must have
         * been done with parent->d_lock held, so
         * spin_lock() above is enough of a barrier
         * for checking if it's still our child.
        if (unlikely(parent != dentry->d_parent)) {
                goto again;
        if (parent != dentry)
                spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
                parent = NULL;
        return parent;
(in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
under ->d_lock on dentry itself and ->d_lock on both old and new values
of ->d_parent.

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux