[PATCH] [REPOST v2] fuse: drop dentry on failed revalidate

Anand Avati <avati@xxxxxxxxxx> · Thu, 25 Jul 2013 01:55:44 -0400

Consider the following sequence of operations:

  // mount same backend at two places

  # mount -t fuse <some_src> /mnt1
  # mount -t fuse <same_src> /mnt2

  // create a directory chain from 1

  $ mkdir -p /mnt1/a/b

  // load it in 2's cache

  $ stat /mnt2/a/b     # load it in cache

  // recreate same names from 1

  $ rm -rf /mnt1/a
  $ mkdir -p /mnt1/a/b

  // sleep long enough for entry_timeout to expire

  $ sleep 5

  // access /mnt2/a/b from two threads in parallel

  $ stat /mnt2/a/b & stat /mnt2/a/b

Depending on the race, none/either/both of the commands
executed in the last step can fail.

This is because both the stat command threads execute the
resolver in parallel.

- The resolver function lookup_fast() will acquire the dentry
  (of /mnt2/a) reference with __d_lookup()

- Call to d_revalidate() on the just acquired dentry will fail,
  (i.e return 0) as FUSE gets a new nodeid from the server.

- In the mean time another resolver thread enters lookup_fast()
  and acquires the dentry of /mnt2/a with __d_lookup(), effectively
  making dentry->d_count > 1 [+ child refs]

- Now when first thread calls d_invalidate() because of the failed
  d_revalidate(), d_invalidate() will find that even after calling
  shrink_dcache_parent() we are left with d_count > 1, and fails
  d_invalidate() with EBUSY.

- The failed d_invalidate() makes the resolver use this "stale" dentry
  as the result of this walk_component() call -- even though it just
  witnessed d_revalidate() fail on it, only because d_invalidate()
  could not succeed because of an innocent concurrent resolver in
  progress.

- Using the stale dentry (and inode), the call progress and stubles
  with an error as the FUSE server is presented with a dead inode.

- The other thread would fail in d_revalidate() too, and depending
  on the progress relaitvely made between the two, the second
  thread's d_invalidate() might get an EBUSY too, and stuble in the
  same way as the first thread.

If the same stat commands were issued serially, both would succeed.

NFS is faced with a similar situation as FUSE (and in many other ways
in general too) and it checks for a submounts and conditionally calls
d_drop(). The call to d_drop() within ->d_revalidate() guarantees the
success of d_invalidate(), and a fresh lookup would be issued there on.

Signed-off-by: Anand Avati <avati@xxxxxxxxxx>
---

Background:

The previous submission of this patch (on fuse-devel) had review comments
to investigate doing a d_drop() on the entire subtree rather than just
on the entry. That approach seems to be very complex. So reposting the
same patch to kick in the discussion again. This patch follows the NFS
approach to the problem.

 fs/fuse/dir.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index a1d9047..83c217e 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 		if (!err) {
 			struct fuse_inode *fi = get_fuse_inode(inode);
 			if (outarg.nodeid != get_node_id(inode)) {
+				if (!have_submounts(entry)) {
+					shrink_dcache_parent(entry);
+					d_drop(entry);
+				}
 				fuse_queue_forget(fc, forget, outarg.nodeid, 1);
 				return 0;
 			}
-- 
1.7.12.1

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html