Re: [PATCH] NFS: don't unhash dentry during unlink.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2022-07-26 at 14:40 +1000, NeilBrown wrote:
> 
> Hi Trond,
>  did you have a change to look at this patch?

I did take a look, and the patch does seem acceptable to me as it
stands.

However without the full context of the other patches you're writing,
I'm not able to ascertain whether or not a better approach than
overloading the meaning of DCACHE_DONTCACHE might exist. I'm wondering
if perhaps co-opting struct nfs_unlinkdata and/or creating something
similar for the non-sillyrename case might not work just as well?

> 
> Thanks,
> NeilBrown
> 
> On Wed, 06 Jul 2022, NeilBrown wrote:
> > NFS unlink() must determine if the file is open, and must perform a
> > "silly rename" instead of an unlink if it is.  Otherwise the client
> > might hold a file open which has been removed on the server.
> > 
> > Consequently if it determines that the file isn't open, it must
> > block
> > any subsequent opens until the unlink has been completed on the
> > server.
> > 
> > This is currently achieved by unhashing the dentry.  This forces
> > any
> > open attempt to the slow-path for lookup which will block on i_sem
> > on
> > the directory until the unlink completes.  A proposed patch will
> > change
> > the VFS to only get a shared lock on i_sem for unlink, so this will
> > no
> > longer work.
> > 
> > Instead we introduce an explicit interlock.  A flag is set on the
> > dentry
> > while the unlink is running and ->d_revalidate blocks while that
> > flag is
> > set.  This closes the race without requiring exclusion on i_sem.
> > unlink will still have exclusion on the dentry being unlinked, so
> > it
> > will be safe to set and then clear the flag without any risk of
> > another
> > thread touching the flag.
> > 
> > There is little room for adding new dentry flags, so instead of
> > adding a
> > new flag, we overload an existing flag which is not used by NFS.
> > 
> > DCACHE_DONTCACHE is only set for filesystems which call
> > d_mark_dontcache() and NFS never calls this, so it is currently
> > unused
> > in NFS.
> > DCACHE_DONTCACHE is only tested when the last reference on a dentry
> > has
> > been dropped, so it is safe for NFS to set and then clear the flag
> > while
> > holding a reference - the setting of the flag cannot cause a
> > misunderstanding.
> > 
> > So we define DCACHE_NFS_PENDING_UNLINK as an alias for
> > DCACHE_DONTCACHE
> > and add a definition to nfs_fs.h so that if NFS ever does find a
> > need to
> > call d_mark_dontcache() the build will fail with a suitable error.
> > 
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> > 
> > Hi Trond/Anna,
> >  this patch is a precursor for my parallel-directory-updates patch
> > set.
> >  I would be particularly helpful if this (and the nfsd patches I
> >  recently sent) could land for the next merge window.  Then I could
> > post
> >  a substantially reduced series to implement parallel directory
> >  updates, which would then be easier for other to review.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> >  fs/nfs/dir.c           | 23 ++++++++++++++++-------
> >  include/linux/nfs_fs.h | 14 ++++++++++++++
> >  2 files changed, 30 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 0c4e8dd6aa96..695bb057cbd2 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry
> > *dentry, unsigned int flags,
> >         int ret;
> >  
> >         if (flags & LOOKUP_RCU) {
> > +               if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK)
> > +                       return -ECHILD;
> >                 parent = READ_ONCE(dentry->d_parent);
> >                 dir = d_inode_rcu(parent);
> >                 if (!dir)
> > @@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry
> > *dentry, unsigned int flags,
> >                 if (parent != READ_ONCE(dentry->d_parent))
> >                         return -ECHILD;
> >         } else {
> > +               /* Wait for unlink to complete */
> > +               wait_var_event(&dentry->d_flags,
> > +                              !(dentry->d_flags &
> > DCACHE_NFS_PENDING_UNLINK));
> >                 parent = dget_parent(dentry);
> >                 ret = reval(d_inode(parent), dentry, flags);
> >                 dput(parent);
> > @@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry
> > *dentry)
> >  int nfs_unlink(struct inode *dir, struct dentry *dentry)
> >  {
> >         int error;
> > -       int need_rehash = 0;
> >  
> >         dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb-
> > >s_id,
> >                 dir->i_ino, dentry);
> > @@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct
> > dentry *dentry)
> >                 error = nfs_sillyrename(dir, dentry);
> >                 goto out;
> >         }
> > -       if (!d_unhashed(dentry)) {
> > -               __d_drop(dentry);
> > -               need_rehash = 1;
> > -       }
> > +       /* We must prevent any concurrent open until the unlink
> > +        * completes.  ->d_revalidate will wait for
> > DCACHE_NFS_PENDING_UNLINK
> > +        * to clear.  We set it here to ensure no lookup succeeds
> > until
> > +        * the unlink is complete on the server.
> > +        */
> > +       dentry->d_flags |= DCACHE_NFS_PENDING_UNLINK;
> > +
> >         spin_unlock(&dentry->d_lock);
> >         error = nfs_safe_remove(dentry);
> >         nfs_dentry_remove_handle_error(dir, dentry, error);
> > -       if (need_rehash)
> > -               d_rehash(dentry);
> > +       spin_lock(&dentry->d_lock);
> > +       dentry->d_flags &= ~DCACHE_NFS_PENDING_UNLINK;
> > +       spin_unlock(&dentry->d_lock);
> > +       wake_up_var(&dentry->d_flags);
> >  out:
> >         trace_nfs_unlink_exit(dir, dentry, error);
> >         return error;
> > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > index a17c337dbdf1..041a6076e045 100644
> > --- a/include/linux/nfs_fs.h
> > +++ b/include/linux/nfs_fs.h
> > @@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid)
> >  
> >  #define NFS_JUKEBOX_RETRY_TIME (5 * HZ)
> >  
> > +/* We need to block new opens while a file is being unlinked.
> > + * If it is opened *before* we decide to unlink, we will silly-
> > rename
> > + * instead. If it is opened *after*, then we the open to fail
> > unless it creates
> > + * a new file.
> > + * If we allow the open and unlink to race, we could end up with a
> > file that is
> > + * open but deleted on the server resulting in ESTALE.
> > + * So overload DCACHE_DONTCACHE to record when the unlink is
> > happening
> > + * and block dentry revalidation while it is set.
> > + * DCACHE_DONTCACHE is only used by filesystems which call
> > d_mark_dontcache()
> > + * which NFS never calls.  It is only tested on a dentry on which
> > all references
> > + * have been dropped, so it is safe for NFS to set it while
> > holding a reference.
> > + */
> > +#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE
> > +#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use
> > d_mark_dontcache()")
> >  
> >  # undef ifdebug
> >  # ifdef NFS_DEBUG
> > -- 
> > 2.36.1
> > 
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux