Re: [PATCH] vfs: fix race in rcu lookup of pruned dentry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 18, 2011 at 12:47 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> Huh?  We do __d_drop() in there, and do that before we start messing
> with ->d_inode.

Hmm. Yes, looking at it, the ordering all seems correct. But then what
did Hugh see at all?

The inode thing he got from d_inode is re-verified by
__d_lookup_rcu(). So if inode is NULL, that means that the other CPU
has done dentry_iput(), which means that __d_drop has already
happened, which means that the dentry has been removed from the hash
list *and* the count has been incremented.

So just judging by Hugh's thing, something is wrong there. Some
missing barrier, perhaps. But write_seqcount_barrier() does seem to
have the write barrier, and the __d_lookup_rcu() read barrier check is
using the proper "full" read barrier too.

So in order for __d_lookup_rcu() to return a NULL inode, we have the
following requirements:
 - it needs to find the dentry (duh)
 - the dentry sequence count gets read (proper read barrier after this)
 - the dentry must still look hashed
 - the dentry->d_inode must be NULL
 - and the dentry sequence count gets re-read (proper read barrier
before this) and must match.

And right now I don't see how that can happen, exactly because
__d_lookup_rcu does the sequence point check. But that's what Hugh's
patch depends on: seeing a NULL d_inode race.

If we see d_inode being NULL, that means that the first sequence
number read must happen *after* we've set d_inode to NULL in
dentry_iput(), which happens *after* we've done the sequence number
increment. That part is fine: that means that the sequence numbers
will match (because both of them are the "later" one). No
inconsistency so far. d_inode being NULL is perfectly compatible with
no sequence number change, and that was what I thought was a bug in
this area at first.

But if the first sequence number read (the read_seqcount_begin() in
__d_lookup_rcu()) sees the later sequence number, that means that
__d_drop has already happened, and it should *also* see the
dentry->d_hash.pprev = NULL; that happened in __d_drop before the
dentry_rcuwalk_barrier(). We have both the read barrier in the
read_seqcount_begin() and the write barrier in the
dentry_rcuwalk_barrier() to guarantee that.

But if the __d_lookup_rcu() sees that, then the d_unhashed() should
have seen the NULL pprev pointer, and not ever returned the dentry,
because that __d_lookup_rcu() loop does

                if (d_unhashed(dentry))
                        continue;

after reading the sequence count.

So how could Hugh's NULL inode ever happen in the first place? Even
with the current sources? It all looks solid to me now that I look at
all the details.

                           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux