NFS unlink() must determine if the file is open, and must perform a "silly rename" instead of an unlink if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_sem on the directory until the unlink completes. A proposed patch will change the VFS to only get a shared lock on i_sem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A flag is set on the dentry while the unlink is running and ->d_revalidate blocks while that flag is set. This closes the race without requiring exclusion on i_sem. unlink will still have exclusion on the dentry being unlinked, so it will be safe to set and then clear the flag without any risk of another thread touching the flag. There is little room for adding new dentry flags, so instead of adding a new flag, we overload an existing flag which is not used by NFS. DCACHE_DONTCACHE is only set for filesystems which call d_mark_dontcache() and NFS never calls this, so it is currently unused in NFS. DCACHE_DONTCACHE is only tested when the last reference on a dentry has been dropped, so it is safe for NFS to set and then clear the flag while holding a reference - the setting of the flag cannot cause a misunderstanding. So we define DCACHE_NFS_PENDING_UNLINK as an alias for DCACHE_DONTCACHE and add a definition to nfs_fs.h so that if NFS ever does find a need to call d_mark_dontcache() the build will fail with a suitable error. Signed-off-by: NeilBrown <neilb@xxxxxxx> --- Hi Trond/Anna, this patch is a precursor for my parallel-directory-updates patch set. I would be particularly helpful if this (and the nfsd patches I recently sent) could land for the next merge window. Then I could post a substantially reduced series to implement parallel directory updates, which would then be easier for other to review. Thanks, NeilBrown fs/nfs/dir.c | 23 ++++++++++++++++------- include/linux/nfs_fs.h | 14 ++++++++++++++ 2 files changed, 30 insertions(+), 7 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 0c4e8dd6aa96..695bb057cbd2 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags, int ret; if (flags & LOOKUP_RCU) { + if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK) + return -ECHILD; parent = READ_ONCE(dentry->d_parent); dir = d_inode_rcu(parent); if (!dir) @@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags, if (parent != READ_ONCE(dentry->d_parent)) return -ECHILD; } else { + /* Wait for unlink to complete */ + wait_var_event(&dentry->d_flags, + !(dentry->d_flags & DCACHE_NFS_PENDING_UNLINK)); parent = dget_parent(dentry); ret = reval(d_inode(parent), dentry, flags); dput(parent); @@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry *dentry) int nfs_unlink(struct inode *dir, struct dentry *dentry) { int error; - int need_rehash = 0; dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb->s_id, dir->i_ino, dentry); @@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry) error = nfs_sillyrename(dir, dentry); goto out; } - if (!d_unhashed(dentry)) { - __d_drop(dentry); - need_rehash = 1; - } + /* We must prevent any concurrent open until the unlink + * completes. ->d_revalidate will wait for DCACHE_NFS_PENDING_UNLINK + * to clear. We set it here to ensure no lookup succeeds until + * the unlink is complete on the server. + */ + dentry->d_flags |= DCACHE_NFS_PENDING_UNLINK; + spin_unlock(&dentry->d_lock); error = nfs_safe_remove(dentry); nfs_dentry_remove_handle_error(dir, dentry, error); - if (need_rehash) - d_rehash(dentry); + spin_lock(&dentry->d_lock); + dentry->d_flags &= ~DCACHE_NFS_PENDING_UNLINK; + spin_unlock(&dentry->d_lock); + wake_up_var(&dentry->d_flags); out: trace_nfs_unlink_exit(dir, dentry, error); return error; diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index a17c337dbdf1..041a6076e045 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid) #define NFS_JUKEBOX_RETRY_TIME (5 * HZ) +/* We need to block new opens while a file is being unlinked. + * If it is opened *before* we decide to unlink, we will silly-rename + * instead. If it is opened *after*, then we the open to fail unless it creates + * a new file. + * If we allow the open and unlink to race, we could end up with a file that is + * open but deleted on the server resulting in ESTALE. + * So overload DCACHE_DONTCACHE to record when the unlink is happening + * and block dentry revalidation while it is set. + * DCACHE_DONTCACHE is only used by filesystems which call d_mark_dontcache() + * which NFS never calls. It is only tested on a dentry on which all references + * have been dropped, so it is safe for NFS to set it while holding a reference. + */ +#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE +#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use d_mark_dontcache()") # undef ifdebug # ifdef NFS_DEBUG -- 2.36.1