On Jul 4, 2016, at 10:28 PM, Oleg Drokin wrote: > > On Jul 3, 2016, at 2:29 AM, Al Viro wrote: > >> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote: >> >>> Sorry to nag you about this, but did any of those pan out? >>> >>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with >>> a dentry already (though a potentially shared one, I understand). >>> Would not it be better to try and establish some dentry locking rule for calling into >>> d_splice_alias() instead? At least then the callers can make sure the dentry does >>> not change under them? >>> Though I guess if there's dentry locking like that, we might as well do all the >>> checking in d_splice_alias(), but that means the unhashed dentries would no >>> longer be disallowed which is a change of semantic from now.-- >> >> FWIW, the only interesting case here is this: >> * no O_CREAT in flags (otherwise the parent is held exclusive). >> * dentry is found in hash >> * dentry is negative >> * dentry has passed ->d_revalidate() (i.e. in case of >> NFS it had nfs_neg_need_reval() return false). >> >> Only two instances are non-trivial in that respect - NFS and Lustre. >> Everything else will simply fail open() with ENOENT in that case. >> >> And at least for NFS we could bloody well do d_drop + d_alloc_parallel + >> finish_no_open and bugger off in case it's not in_lookup, otherwise do >> pretty much what we do in case we'd got in_lookup from the very beginning. >> Some adjustments are needed for that case (basically, we need to make >> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal >> with refcounting correctly). >> >> Tentative NFS patch follows; I don't understand Lustre well enough, but it >> looks like a plausible strategy there as well. > > This patch seems to have brought the other crash back in (or something similar), > the one with a negative dentry being hashed when it's already hashed. > it's not as easy to hit as before, but a lot easier than the race we are hitting here. > > Also in all cases it is ls that is crashing now, which seems to highlight it is > taking some of this new paths added. > I have half a dosen crashdumps if you want me to look something up there. > This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef > with just your patch on top. > > I can reproduce it with both the complex workload, or with a simplified one (attached), > takes anywhere from 5 to 30 minutes to hit. > >> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >> index d8015a03..5474e39 100644 >> --- a/fs/nfs/dir.c >> +++ b/fs/nfs/dir.c >> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, >> struct file *file, unsigned open_flags, >> umode_t mode, int *opened) >> { >> + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); >> struct nfs_open_context *ctx; >> struct dentry *res; >> struct iattr attr = { .ia_valid = ATTR_OPEN }; >> struct inode *inode; >> unsigned int lookup_flags = 0; >> + bool switched = false; >> int err; >> >> /* Expect a negative dentry */ >> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, >> attr.ia_size = 0; >> } >> >> + if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) { >> + d_drop(dentry); >> + switched = true; >> + dentry = d_alloc_parallel(dentry->d_parent, >> + &dentry->d_name, &wq); > > Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in > rcu attached to the same parent with the same name it seems? > What's to stop a parallel thread to rehash the dentry after we dropped it here, > and so d_alloc_parallel will happily find it? > I am running a test to verify this theory now. Ok, so at least in my case we do not come out of d_alloc_parallel() with a hashed dentry, though I wonder why are not you trying to catch this case here? If you rely on the !d_in_lookup(dentry) below, that seems to be racy. if it was __d_lookup_rcu() that found the dentry, we do not seem to be performing any additional checks on it and just return it. But what if it was just added by a parallel caller here that just finished d_splice_alias (= hashed dentry so it's not rejected by the __d_lookup_rcu) and at the same time have not progressed all the way to d_lookup_done() call below? Interesting that in some of the dumps d_hash is all zeroes, which means it is unhashed, yet the assertion was triggered, so there was some parallel caller that performed a d_drop on us (but nothing of interest on other CPUs). In all cases d_flags = 140, so we do not have the DCACHE_PAR_LOOKUP set at the point of crash. Also in part of the dumps the dentry is actually positive, not negative. > >> + if (IS_ERR(dentry)) >> + return PTR_ERR(dentry); >> + if (unlikely(!d_in_lookup(dentry))) >> + return finish_no_open(file, dentry); >> + } >> + >> ctx = create_nfs_open_context(dentry, open_flags); >> err = PTR_ERR(ctx); >> if (IS_ERR(ctx)) >> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, >> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err); >> put_nfs_open_context(ctx); >> out: >> + if (unlikely(switched)) { >> + d_lookup_done(dentry); >> + dput(dentry); >> + } >> return err; >> >> no_open: >> res = nfs_lookup(dir, dentry, lookup_flags); >> - err = PTR_ERR(res); >> + if (switched) { >> + d_lookup_done(dentry); >> + if (!res) >> + res = dentry; >> + else >> + dput(dentry); >> + } >> if (IS_ERR(res)) >> - goto out; >> - >> + return PTR_ERR(res); >> return finish_no_open(file, res); >> } >> EXPORT_SYMBOL_GPL(nfs_atomic_open); > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html