Consider the following sequence of operations: // mount same backend at two places # mount -t fuse <some_src> /mnt1 # mount -t fuse <same_src> /mnt2 // create a directory chain from 1 $ mkdir -p /mnt1/a/b // load it in 2's cache $ stat /mnt2/a/b # load it in cache // recreate same names from 1 $ rm -rf /mnt1/a $ mkdir -p /mnt1/a/b // sleep long enough for entry_timeout to expire $ sleep 5 // access /mnt2/a/b from two threads in parallel $ stat /mnt2/a/b & stat /mnt2/a/b Depending on the race, none/either/both of the commands executed in the last step can fail. This is because both the stat command threads execute the resolver in parallel. - The resolver function lookup_fast() will acquire the dentry (of /mnt2/a) reference with __d_lookup() - Call to d_revalidate() on the just acquired dentry will fail, (i.e return 0) as FUSE gets a new nodeid from the server. - In the mean time another resolver thread enters lookup_fast() and acquires the dentry of /mnt2/a with __d_lookup(), effectively making dentry->d_count > 1 [+ child refs] - Now when first thread calls d_invalidate() because of the failed d_revalidate(), d_invalidate() will find that even after calling shrink_dcache_parent() we are left with d_count > 1, and fails d_invalidate() with EBUSY. - The failed d_invalidate() makes the resolver use this "stale" dentry as the result of this walk_component() call -- even though it just witnessed d_revalidate() fail on it, only because d_invalidate() could not succeed because of an innocent concurrent resolver in progress. - Using the stale dentry (and inode), the call progress and stubles with an error as the FUSE server is presented with a dead inode. - The other thread would fail in d_revalidate() too, and depending on the progress relaitvely made between the two, the second thread's d_invalidate() might get an EBUSY too, and stuble in the same way as the first thread. If the same stat commands were issued serially, both would succeed. NFS is faced with a similar situation as FUSE (and in many other ways in general too) and it checks for a submounts and conditionally calls d_drop(). The call to d_drop() within ->d_revalidate() guarantees the success of d_invalidate(), and a fresh lookup would be issued there on. Signed-off-by: Anand Avati <avati@xxxxxxxxxx> --- Background: The previous submission of this patch (on fuse-devel) had review comments to investigate doing a d_drop() on the entire subtree rather than just on the entry. That approach seems to be very complex. So reposting the same patch to kick in the discussion again. This patch follows the NFS approach to the problem. fs/fuse/dir.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index a1d9047..83c217e 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (!err) { struct fuse_inode *fi = get_fuse_inode(inode); if (outarg.nodeid != get_node_id(inode)) { + if (!have_submounts(entry)) { + shrink_dcache_parent(entry); + d_drop(entry); + } fuse_queue_forget(fc, forget, outarg.nodeid, 1); return 0; } -- 1.7.12.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html