Re: [PATCH] NFS: don't unhash dentry during unlink.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2022-07-06 at 15:10 +1000, NeilBrown wrote:
> NFS unlink() must determine if the file is open, and must perform a
> "silly rename" instead of an unlink if it is.  Otherwise the client
> might hold a file open which has been removed on the server.
> 
> Consequently if it determines that the file isn't open, it must block
> any subsequent opens until the unlink has been completed on the server.
> 
> This is currently achieved by unhashing the dentry.  This forces any
> open attempt to the slow-path for lookup which will block on i_sem on
> the directory until the unlink completes.  A proposed patch will change
> the VFS to only get a shared lock on i_sem for unlink, so this will no
> longer work.
> 
> Instead we introduce an explicit interlock.  A flag is set on the dentry
> while the unlink is running and ->d_revalidate blocks while that flag is
> set.  This closes the race without requiring exclusion on i_sem.
> unlink will still have exclusion on the dentry being unlinked, so it
> will be safe to set and then clear the flag without any risk of another
> thread touching the flag.
> 
> There is little room for adding new dentry flags, so instead of adding a
> new flag, we overload an existing flag which is not used by NFS.
> 
> DCACHE_DONTCACHE is only set for filesystems which call
> d_mark_dontcache() and NFS never calls this, so it is currently unused
> in NFS.
> DCACHE_DONTCACHE is only tested when the last reference on a dentry has
> been dropped, so it is safe for NFS to set and then clear the flag while
> holding a reference - the setting of the flag cannot cause a
> misunderstanding.
> 
> So we define DCACHE_NFS_PENDING_UNLINK as an alias for DCACHE_DONTCACHE
> and add a definition to nfs_fs.h so that if NFS ever does find a need to
> call d_mark_dontcache() the build will fail with a suitable error.
> 
> Signed-off-by: NeilBrown <neilb@xxxxxxx>
> ---
> 
> Hi Trond/Anna,
>  this patch is a precursor for my parallel-directory-updates patch set.
>  I would be particularly helpful if this (and the nfsd patches I
>  recently sent) could land for the next merge window.  Then I could post
>  a substantially reduced series to implement parallel directory
>  updates, which would then be easier for other to review.
> 
> Thanks,
> NeilBrown
> 
> 
>  fs/nfs/dir.c           | 23 ++++++++++++++++-------
>  include/linux/nfs_fs.h | 14 ++++++++++++++
>  2 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 0c4e8dd6aa96..695bb057cbd2 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
>  	int ret;
>  
>  	if (flags & LOOKUP_RCU) {
> +		if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK)
> +			return -ECHILD;
>  		parent = READ_ONCE(dentry->d_parent);
>  		dir = d_inode_rcu(parent);
>  		if (!dir)
> @@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
>  		if (parent != READ_ONCE(dentry->d_parent))
>  			return -ECHILD;
>  	} else {
> +		/* Wait for unlink to complete */
> +		wait_var_event(&dentry->d_flags,
> +			       !(dentry->d_flags & DCACHE_NFS_PENDING_UNLINK));
>  		parent = dget_parent(dentry);
>  		ret = reval(d_inode(parent), dentry, flags);
>  		dput(parent);
> @@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry *dentry)
>  int nfs_unlink(struct inode *dir, struct dentry *dentry)
>  {
>  	int error;
> -	int need_rehash = 0;
>  
>  	dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb->s_id,
>  		dir->i_ino, dentry);
> @@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
>  		error = nfs_sillyrename(dir, dentry);
>  		goto out;
>  	}
> -	if (!d_unhashed(dentry)) {
> -		__d_drop(dentry);
> -		need_rehash = 1;
> -	}
> +	/* We must prevent any concurrent open until the unlink
> +	 * completes.  ->d_revalidate will wait for DCACHE_NFS_PENDING_UNLINK
> +	 * to clear.  We set it here to ensure no lookup succeeds until
> +	 * the unlink is complete on the server.
> +	 */
> +	dentry->d_flags |= DCACHE_NFS_PENDING_UNLINK;
> +
>  	spin_unlock(&dentry->d_lock);
>  	error = nfs_safe_remove(dentry);
>  	nfs_dentry_remove_handle_error(dir, dentry, error);
> -	if (need_rehash)
> -		d_rehash(dentry);
> +	spin_lock(&dentry->d_lock);
> +	dentry->d_flags &= ~DCACHE_NFS_PENDING_UNLINK;
> +	spin_unlock(&dentry->d_lock);
> +	wake_up_var(&dentry->d_flags);
>  out:
>  	trace_nfs_unlink_exit(dir, dentry, error);
>  	return error;
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index a17c337dbdf1..041a6076e045 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid)
>  
>  #define NFS_JUKEBOX_RETRY_TIME (5 * HZ)
>  
> +/* We need to block new opens while a file is being unlinked.
> + * If it is opened *before* we decide to unlink, we will silly-rename
> + * instead. If it is opened *after*, then we the open to fail unless it creates

"then we allow the open to fail"

> + * a new file.
> + * If we allow the open and unlink to race, we could end up with a file that is
> + * open but deleted on the server resulting in ESTALE.
> + * So overload DCACHE_DONTCACHE to record when the unlink is happening
> + * and block dentry revalidation while it is set.
> + * DCACHE_DONTCACHE is only used by filesystems which call d_mark_dontcache()
> + * which NFS never calls.  It is only tested on a dentry on which all references
> + * have been dropped, so it is safe for NFS to set it while holding a reference.
> + */
> +#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE
> +#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use d_mark_dontcache()")
>  
>  # undef ifdebug
>  # ifdef NFS_DEBUG

Wow, we really are out of dentry flags. I wonder if some of them are no
longer needed?

This overloading is a bit klunky but it's probably OK. AFAICT,
0x80000000 is still available though if this turns out to be too nasty.
It looks like 0x08000000 may also be free


Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux