[PATCH] NFS: don't unhash dentry during unlink.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



NFS unlink() must determine if the file is open, and must perform a
"silly rename" instead of an unlink if it is.  Otherwise the client
might hold a file open which has been removed on the server.

Consequently if it determines that the file isn't open, it must block
any subsequent opens until the unlink has been completed on the server.

This is currently achieved by unhashing the dentry.  This forces any
open attempt to the slow-path for lookup which will block on i_sem on
the directory until the unlink completes.  A proposed patch will change
the VFS to only get a shared lock on i_sem for unlink, so this will no
longer work.

Instead we introduce an explicit interlock.  A flag is set on the dentry
while the unlink is running and ->d_revalidate blocks while that flag is
set.  This closes the race without requiring exclusion on i_sem.
unlink will still have exclusion on the dentry being unlinked, so it
will be safe to set and then clear the flag without any risk of another
thread touching the flag.

There is little room for adding new dentry flags, so instead of adding a
new flag, we overload an existing flag which is not used by NFS.

DCACHE_DONTCACHE is only set for filesystems which call
d_mark_dontcache() and NFS never calls this, so it is currently unused
in NFS.
DCACHE_DONTCACHE is only tested when the last reference on a dentry has
been dropped, so it is safe for NFS to set and then clear the flag while
holding a reference - the setting of the flag cannot cause a
misunderstanding.

So we define DCACHE_NFS_PENDING_UNLINK as an alias for DCACHE_DONTCACHE
and add a definition to nfs_fs.h so that if NFS ever does find a need to
call d_mark_dontcache() the build will fail with a suitable error.

Signed-off-by: NeilBrown <neilb@xxxxxxx>
---

Hi Trond/Anna,
 this patch is a precursor for my parallel-directory-updates patch set.
 I would be particularly helpful if this (and the nfsd patches I
 recently sent) could land for the next merge window.  Then I could post
 a substantially reduced series to implement parallel directory
 updates, which would then be easier for other to review.

Thanks,
NeilBrown


 fs/nfs/dir.c           | 23 ++++++++++++++++-------
 include/linux/nfs_fs.h | 14 ++++++++++++++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 0c4e8dd6aa96..695bb057cbd2 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
 	int ret;
 
 	if (flags & LOOKUP_RCU) {
+		if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK)
+			return -ECHILD;
 		parent = READ_ONCE(dentry->d_parent);
 		dir = d_inode_rcu(parent);
 		if (!dir)
@@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
 		if (parent != READ_ONCE(dentry->d_parent))
 			return -ECHILD;
 	} else {
+		/* Wait for unlink to complete */
+		wait_var_event(&dentry->d_flags,
+			       !(dentry->d_flags & DCACHE_NFS_PENDING_UNLINK));
 		parent = dget_parent(dentry);
 		ret = reval(d_inode(parent), dentry, flags);
 		dput(parent);
@@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry *dentry)
 int nfs_unlink(struct inode *dir, struct dentry *dentry)
 {
 	int error;
-	int need_rehash = 0;
 
 	dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb->s_id,
 		dir->i_ino, dentry);
@@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
 		error = nfs_sillyrename(dir, dentry);
 		goto out;
 	}
-	if (!d_unhashed(dentry)) {
-		__d_drop(dentry);
-		need_rehash = 1;
-	}
+	/* We must prevent any concurrent open until the unlink
+	 * completes.  ->d_revalidate will wait for DCACHE_NFS_PENDING_UNLINK
+	 * to clear.  We set it here to ensure no lookup succeeds until
+	 * the unlink is complete on the server.
+	 */
+	dentry->d_flags |= DCACHE_NFS_PENDING_UNLINK;
+
 	spin_unlock(&dentry->d_lock);
 	error = nfs_safe_remove(dentry);
 	nfs_dentry_remove_handle_error(dir, dentry, error);
-	if (need_rehash)
-		d_rehash(dentry);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags &= ~DCACHE_NFS_PENDING_UNLINK;
+	spin_unlock(&dentry->d_lock);
+	wake_up_var(&dentry->d_flags);
 out:
 	trace_nfs_unlink_exit(dir, dentry, error);
 	return error;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a17c337dbdf1..041a6076e045 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid)
 
 #define NFS_JUKEBOX_RETRY_TIME (5 * HZ)
 
+/* We need to block new opens while a file is being unlinked.
+ * If it is opened *before* we decide to unlink, we will silly-rename
+ * instead. If it is opened *after*, then we the open to fail unless it creates
+ * a new file.
+ * If we allow the open and unlink to race, we could end up with a file that is
+ * open but deleted on the server resulting in ESTALE.
+ * So overload DCACHE_DONTCACHE to record when the unlink is happening
+ * and block dentry revalidation while it is set.
+ * DCACHE_DONTCACHE is only used by filesystems which call d_mark_dontcache()
+ * which NFS never calls.  It is only tested on a dentry on which all references
+ * have been dropped, so it is safe for NFS to set it while holding a reference.
+ */
+#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE
+#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use d_mark_dontcache()")
 
 # undef ifdebug
 # ifdef NFS_DEBUG
-- 
2.36.1





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux