On Tue, 2022-07-26 at 14:40 +1000, NeilBrown wrote: > > Hi Trond, > did you have a change to look at this patch? I did take a look, and the patch does seem acceptable to me as it stands. However without the full context of the other patches you're writing, I'm not able to ascertain whether or not a better approach than overloading the meaning of DCACHE_DONTCACHE might exist. I'm wondering if perhaps co-opting struct nfs_unlinkdata and/or creating something similar for the non-sillyrename case might not work just as well? > > Thanks, > NeilBrown > > On Wed, 06 Jul 2022, NeilBrown wrote: > > NFS unlink() must determine if the file is open, and must perform a > > "silly rename" instead of an unlink if it is. Otherwise the client > > might hold a file open which has been removed on the server. > > > > Consequently if it determines that the file isn't open, it must > > block > > any subsequent opens until the unlink has been completed on the > > server. > > > > This is currently achieved by unhashing the dentry. This forces > > any > > open attempt to the slow-path for lookup which will block on i_sem > > on > > the directory until the unlink completes. A proposed patch will > > change > > the VFS to only get a shared lock on i_sem for unlink, so this will > > no > > longer work. > > > > Instead we introduce an explicit interlock. A flag is set on the > > dentry > > while the unlink is running and ->d_revalidate blocks while that > > flag is > > set. This closes the race without requiring exclusion on i_sem. > > unlink will still have exclusion on the dentry being unlinked, so > > it > > will be safe to set and then clear the flag without any risk of > > another > > thread touching the flag. > > > > There is little room for adding new dentry flags, so instead of > > adding a > > new flag, we overload an existing flag which is not used by NFS. > > > > DCACHE_DONTCACHE is only set for filesystems which call > > d_mark_dontcache() and NFS never calls this, so it is currently > > unused > > in NFS. > > DCACHE_DONTCACHE is only tested when the last reference on a dentry > > has > > been dropped, so it is safe for NFS to set and then clear the flag > > while > > holding a reference - the setting of the flag cannot cause a > > misunderstanding. > > > > So we define DCACHE_NFS_PENDING_UNLINK as an alias for > > DCACHE_DONTCACHE > > and add a definition to nfs_fs.h so that if NFS ever does find a > > need to > > call d_mark_dontcache() the build will fail with a suitable error. > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > > > Hi Trond/Anna, > > this patch is a precursor for my parallel-directory-updates patch > > set. > > I would be particularly helpful if this (and the nfsd patches I > > recently sent) could land for the next merge window. Then I could > > post > > a substantially reduced series to implement parallel directory > > updates, which would then be easier for other to review. > > > > Thanks, > > NeilBrown > > > > > > fs/nfs/dir.c | 23 ++++++++++++++++------- > > include/linux/nfs_fs.h | 14 ++++++++++++++ > > 2 files changed, 30 insertions(+), 7 deletions(-) > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > index 0c4e8dd6aa96..695bb057cbd2 100644 > > --- a/fs/nfs/dir.c > > +++ b/fs/nfs/dir.c > > @@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry > > *dentry, unsigned int flags, > > int ret; > > > > if (flags & LOOKUP_RCU) { > > + if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK) > > + return -ECHILD; > > parent = READ_ONCE(dentry->d_parent); > > dir = d_inode_rcu(parent); > > if (!dir) > > @@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry > > *dentry, unsigned int flags, > > if (parent != READ_ONCE(dentry->d_parent)) > > return -ECHILD; > > } else { > > + /* Wait for unlink to complete */ > > + wait_var_event(&dentry->d_flags, > > + !(dentry->d_flags & > > DCACHE_NFS_PENDING_UNLINK)); > > parent = dget_parent(dentry); > > ret = reval(d_inode(parent), dentry, flags); > > dput(parent); > > @@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry > > *dentry) > > int nfs_unlink(struct inode *dir, struct dentry *dentry) > > { > > int error; > > - int need_rehash = 0; > > > > dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb- > > >s_id, > > dir->i_ino, dentry); > > @@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct > > dentry *dentry) > > error = nfs_sillyrename(dir, dentry); > > goto out; > > } > > - if (!d_unhashed(dentry)) { > > - __d_drop(dentry); > > - need_rehash = 1; > > - } > > + /* We must prevent any concurrent open until the unlink > > + * completes. ->d_revalidate will wait for > > DCACHE_NFS_PENDING_UNLINK > > + * to clear. We set it here to ensure no lookup succeeds > > until > > + * the unlink is complete on the server. > > + */ > > + dentry->d_flags |= DCACHE_NFS_PENDING_UNLINK; > > + > > spin_unlock(&dentry->d_lock); > > error = nfs_safe_remove(dentry); > > nfs_dentry_remove_handle_error(dir, dentry, error); > > - if (need_rehash) > > - d_rehash(dentry); > > + spin_lock(&dentry->d_lock); > > + dentry->d_flags &= ~DCACHE_NFS_PENDING_UNLINK; > > + spin_unlock(&dentry->d_lock); > > + wake_up_var(&dentry->d_flags); > > out: > > trace_nfs_unlink_exit(dir, dentry, error); > > return error; > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > > index a17c337dbdf1..041a6076e045 100644 > > --- a/include/linux/nfs_fs.h > > +++ b/include/linux/nfs_fs.h > > @@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid) > > > > #define NFS_JUKEBOX_RETRY_TIME (5 * HZ) > > > > +/* We need to block new opens while a file is being unlinked. > > + * If it is opened *before* we decide to unlink, we will silly- > > rename > > + * instead. If it is opened *after*, then we the open to fail > > unless it creates > > + * a new file. > > + * If we allow the open and unlink to race, we could end up with a > > file that is > > + * open but deleted on the server resulting in ESTALE. > > + * So overload DCACHE_DONTCACHE to record when the unlink is > > happening > > + * and block dentry revalidation while it is set. > > + * DCACHE_DONTCACHE is only used by filesystems which call > > d_mark_dontcache() > > + * which NFS never calls. It is only tested on a dentry on which > > all references > > + * have been dropped, so it is safe for NFS to set it while > > holding a reference. > > + */ > > +#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE > > +#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use > > d_mark_dontcache()") > > > > # undef ifdebug > > # ifdef NFS_DEBUG > > -- > > 2.36.1 > > > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx