On Mon, Feb 29, 2016 at 2:09 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Sun, Feb 28, 2016 at 08:01:01PM +0000, Al Viro wrote: >> On Sun, Feb 28, 2016 at 05:01:34PM +0000, Al Viro wrote: >> >> > Erm... What's to order ->d_inode and ->d_flags fetches there? David? >> > Looks like the barrier in d_is_negative() is on the wrong side of fetch. >> > Confused... >> >> OK, as per David's suggestion, let's flip them around, bringing the >> barrier in d_is_negative() between them. Dmitry, could you try this on >> top of mainline? Again, it's until the first warning. > > Hmm... Reordering is definitely wrong, but what I really wonder is if > dentry_rcuwalk_invalidate() is right outside of __d_drop(). IOW, is > it right in __d_instantiate() and dentry_unlink_inode()? The code > dealing with ->d_flags in RCU mode is more interested in coherency between > ->d_flags and ->d_inode and it looks like we'd need *two* increments - > even-to-odd before updating both and odd-to-even after both are in sync. > The more I look at the situation with d_is_...() wrt barriers and ->d_seq, > the less I understand it; outside of RCU mode we don't really need the > barriers for that stuff and in RCU mode ->d_flags handling had been > a serious headache all along... > > I'm tempted to do as below; the amount of smp_wmb() remains the same and > so's the amount of stores (splitting that += 2 in two doesn't affect that - > we dirty the same cacheline before and after anyway). OTOH, that would > mean that ->d_seq match guarantees ->d_flags and ->d_inode being in sync. > And I suspect that we could drop _read_ barriers in d_is_...() after that; > in non-RCU mode we don't give a damn anyway and in RCU one ->d_seq check > would provide one; it doesn't really matter on x86, but smp_rmb() may be > costly. Splitting ->d_seq increments *does* matter on x86 wrt correctness; > in-between state becomes guaranteed ->d_seq mismatch and that just might > be it. Dmitry, could you add this on top of the previous patch? Regardless of whether reordering is wrong or not, do we see how it can fix the WARNINGs/oopses? Because it does seem to. I've tried to revert just this part: - *inode = d_backing_inode(dentry); negative = d_is_negative(dentry); + *inode = d_backing_inode(dentry); And got: [ 976.609688] WARNING: CPU: 0 PID: 12126 at fs/namei.c:1587 lookup_fast+0x3fa/0x450() [ 976.626768] WARNING: CPU: 0 PID: 12126 at fs/namei.c:3123 path_openat+0x12bc/0x1520() in 15 minutes. In particular, applying this on top the previous patch will be inconclusive, because I already don't see the warnings. > David, Linus, do you see any problems with that? To me it looks saner > that way and as cheap as the current code, but I might be missing something > here... > > diff --git a/fs/dcache.c b/fs/dcache.c > index 92d5140..2c08cce 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -279,7 +279,6 @@ static inline void __d_set_inode_and_type(struct dentry *dentry, > unsigned flags; > > dentry->d_inode = inode; > - smp_wmb(); > flags = READ_ONCE(dentry->d_flags); > flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); > flags |= type_flags; > @@ -300,7 +299,6 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) > > flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); > WRITE_ONCE(dentry->d_flags, flags); > - smp_wmb(); > dentry->d_inode = NULL; > } > > @@ -370,9 +368,11 @@ static void dentry_unlink_inode(struct dentry * dentry) > __releases(dentry->d_inode->i_lock) > { > struct inode *inode = dentry->d_inode; > + > + raw_write_seqcount_begin(&dentry->d_seq); > __d_clear_type_and_inode(dentry); > hlist_del_init(&dentry->d_u.d_alias); > - dentry_rcuwalk_invalidate(dentry); > + raw_write_seqcount_end(&dentry->d_seq); > spin_unlock(&dentry->d_lock); > spin_unlock(&inode->i_lock); > if (!inode->i_nlink) > @@ -1758,8 +1758,9 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode) > spin_lock(&dentry->d_lock); > if (inode) > hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry); > + raw_write_seqcount_begin(&dentry->d_seq); > __d_set_inode_and_type(dentry, inode, add_flags); > - dentry_rcuwalk_invalidate(dentry); > + raw_write_seqcount_end(&dentry->d_seq); > spin_unlock(&dentry->d_lock); > fsnotify_d_instantiate(dentry, inode); > } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html