Re: [PATCH] NFS: don't unhash dentry during unlink.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 07 Jul 2022, Jeff Layton wrote:
> On Wed, 2022-07-06 at 15:10 +1000, NeilBrown wrote:
> > NFS unlink() must determine if the file is open, and must perform a
> > "silly rename" instead of an unlink if it is.  Otherwise the client
> > might hold a file open which has been removed on the server.
> > 
> > Consequently if it determines that the file isn't open, it must block
> > any subsequent opens until the unlink has been completed on the server.
> > 
> > This is currently achieved by unhashing the dentry.  This forces any
> > open attempt to the slow-path for lookup which will block on i_sem on
> > the directory until the unlink completes.  A proposed patch will change
> > the VFS to only get a shared lock on i_sem for unlink, so this will no
> > longer work.
> > 
> > Instead we introduce an explicit interlock.  A flag is set on the dentry
> > while the unlink is running and ->d_revalidate blocks while that flag is
> > set.  This closes the race without requiring exclusion on i_sem.
> > unlink will still have exclusion on the dentry being unlinked, so it
> > will be safe to set and then clear the flag without any risk of another
> > thread touching the flag.
> > 
> > There is little room for adding new dentry flags, so instead of adding a
> > new flag, we overload an existing flag which is not used by NFS.
> > 
> > DCACHE_DONTCACHE is only set for filesystems which call
> > d_mark_dontcache() and NFS never calls this, so it is currently unused
> > in NFS.
> > DCACHE_DONTCACHE is only tested when the last reference on a dentry has
> > been dropped, so it is safe for NFS to set and then clear the flag while
> > holding a reference - the setting of the flag cannot cause a
> > misunderstanding.
> > 
> > So we define DCACHE_NFS_PENDING_UNLINK as an alias for DCACHE_DONTCACHE
> > and add a definition to nfs_fs.h so that if NFS ever does find a need to
> > call d_mark_dontcache() the build will fail with a suitable error.
> > 
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> > 
> > Hi Trond/Anna,
> >  this patch is a precursor for my parallel-directory-updates patch set.
> >  I would be particularly helpful if this (and the nfsd patches I
> >  recently sent) could land for the next merge window.  Then I could post
> >  a substantially reduced series to implement parallel directory
> >  updates, which would then be easier for other to review.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> >  fs/nfs/dir.c           | 23 ++++++++++++++++-------
> >  include/linux/nfs_fs.h | 14 ++++++++++++++
> >  2 files changed, 30 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 0c4e8dd6aa96..695bb057cbd2 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
> >  	int ret;
> >  
> >  	if (flags & LOOKUP_RCU) {
> > +		if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK)
> > +			return -ECHILD;
> >  		parent = READ_ONCE(dentry->d_parent);
> >  		dir = d_inode_rcu(parent);
> >  		if (!dir)
> > @@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
> >  		if (parent != READ_ONCE(dentry->d_parent))
> >  			return -ECHILD;
> >  	} else {
> > +		/* Wait for unlink to complete */
> > +		wait_var_event(&dentry->d_flags,
> > +			       !(dentry->d_flags & DCACHE_NFS_PENDING_UNLINK));
> >  		parent = dget_parent(dentry);
> >  		ret = reval(d_inode(parent), dentry, flags);
> >  		dput(parent);
> > @@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry *dentry)
> >  int nfs_unlink(struct inode *dir, struct dentry *dentry)
> >  {
> >  	int error;
> > -	int need_rehash = 0;
> >  
> >  	dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb->s_id,
> >  		dir->i_ino, dentry);
> > @@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
> >  		error = nfs_sillyrename(dir, dentry);
> >  		goto out;
> >  	}
> > -	if (!d_unhashed(dentry)) {
> > -		__d_drop(dentry);
> > -		need_rehash = 1;
> > -	}
> > +	/* We must prevent any concurrent open until the unlink
> > +	 * completes.  ->d_revalidate will wait for DCACHE_NFS_PENDING_UNLINK
> > +	 * to clear.  We set it here to ensure no lookup succeeds until
> > +	 * the unlink is complete on the server.
> > +	 */
> > +	dentry->d_flags |= DCACHE_NFS_PENDING_UNLINK;
> > +
> >  	spin_unlock(&dentry->d_lock);
> >  	error = nfs_safe_remove(dentry);
> >  	nfs_dentry_remove_handle_error(dir, dentry, error);
> > -	if (need_rehash)
> > -		d_rehash(dentry);
> > +	spin_lock(&dentry->d_lock);
> > +	dentry->d_flags &= ~DCACHE_NFS_PENDING_UNLINK;
> > +	spin_unlock(&dentry->d_lock);
> > +	wake_up_var(&dentry->d_flags);
> >  out:
> >  	trace_nfs_unlink_exit(dir, dentry, error);
> >  	return error;
> > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > index a17c337dbdf1..041a6076e045 100644
> > --- a/include/linux/nfs_fs.h
> > +++ b/include/linux/nfs_fs.h
> > @@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid)
> >  
> >  #define NFS_JUKEBOX_RETRY_TIME (5 * HZ)
> >  
> > +/* We need to block new opens while a file is being unlinked.
> > + * If it is opened *before* we decide to unlink, we will silly-rename
> > + * instead. If it is opened *after*, then we the open to fail unless it creates
> 
> "then we allow the open to fail"

Actually it is "then we need the open to fail".
I should probably do a complete re-write of that para.

> 
> > + * a new file.
> > + * If we allow the open and unlink to race, we could end up with a file that is
> > + * open but deleted on the server resulting in ESTALE.
> > + * So overload DCACHE_DONTCACHE to record when the unlink is happening
> > + * and block dentry revalidation while it is set.
> > + * DCACHE_DONTCACHE is only used by filesystems which call d_mark_dontcache()
> > + * which NFS never calls.  It is only tested on a dentry on which all references
> > + * have been dropped, so it is safe for NFS to set it while holding a reference.
> > + */
> > +#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE
> > +#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use d_mark_dontcache()")
> >  
> >  # undef ifdebug
> >  # ifdef NFS_DEBUG
> 
> Wow, we really are out of dentry flags. I wonder if some of them are no
> longer needed?
> 
> This overloading is a bit klunky but it's probably OK. AFAICT,
> 0x80000000 is still available though if this turns out to be too nasty.
> It looks like 0x08000000 may also be free

I need one of those two in a subsequent patch to lock a dentry while the
name/link is being created.  If I used the other for NFS_PENDING_UNLINK
we would be completely out.

This flag really should be completely private to nfs.  d_fsdata would be
the best place to put it.
But NFS doesn't have a permanent d_fsdata in which I can store a bit.
Nor does it leave d_fsdata untouched, so I cannot store a magic value in
there.
There are two different uses of d_fsdata.  I don't fully understand when
they are active, so I don't know if it is safe to add another
independent use - I suspect not though.

> 
> 
> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
> 

Thanks,
NeilBrown





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux