On Mon, 2020-11-30 at 16:24 -0500, trondmy@xxxxxxxxxx wrote: > From: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> > > With NFSv3 nfsd will always attempt to send along WCC data to the > client. This generally involves saving off the in-core inode information > prior to doing the operation on the given filehandle, and then issuing a > vfs_getattr to it after the op. > > Some filesystems (particularly clustered or networked ones) have an > expensive ->getattr inode operation. Atomicitiy is also often difficult > or impossible to guarantee on such filesystems. For those, we're best > off not trying to provide WCC information to the client at all, and to > simply allow it to poll for that information as needed with a GETATTR > RPC. > > This patch adds a new flags field to struct export_operations, and > defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate > that nfsd should not attempt to provide WCC info in NFSv3 replies. It > also adds a blurb about the new flags field and flag to the exporting > documentation. > > The server will also now skip collecting this information for NFSv2 as > well, since that info is never used there anyway. > > Note that this patch does not add this flag to any filesystem > export_operations structures. This was originally developed to allow > reexporting nfs via nfsd. That code is not (and may never be) suitable > for merging into mainline. > Probably ought to fix up the above paragraph since we are now merging this into mainline. > Other filesystems may want to consider enabling this flag too. It's hard > to tell however which ones have export operations to enable export via > knfsd and which ones mostly rely on them for open-by-filehandle support, > so I'm leaving that up to the individual maintainers to decide. I am > cc'ing the relevant lists for those filesystems that I think may want to > consider adding this though. > > Cc: HPDD-discuss@xxxxxxxxxxxx > Cc: ceph-devel@xxxxxxxxxxxxxxx > Cc: cluster-devel@xxxxxxxxxx > Cc: fuse-devel@xxxxxxxxxxxxxxxxxxxxx > Cc: ocfs2-devel@xxxxxxxxxxxxxx > Signed-off-by: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> > Signed-off-by: Lance Shelton <lance.shelton@xxxxxxxxxxxxxxx> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > --- > Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++ > fs/nfs/export.c | 1 + > fs/nfsd/nfs3xdr.c | 7 ++++-- > fs/nfsd/nfsfh.c | 14 +++++++++++ > fs/nfsd/nfsfh.h | 2 +- > include/linux/exportfs.h | 2 ++ > 6 files changed, 50 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst > index 33d588a01ace..a3e3805833d1 100644 > --- a/Documentation/filesystems/nfs/exporting.rst > +++ b/Documentation/filesystems/nfs/exporting.rst > @@ -154,6 +154,11 @@ struct which has the following members: > to find potential names, and matches inode numbers to find the correct > match. > > > > > + flags > + Some filesystems may need to be handled differently than others. The > + export_operations struct also includes a flags field that allows the > + filesystem to communicate such information to nfsd. See the Export > + Operations Flags section below for more explanation. > > > > > A filehandle fragment consists of an array of 1 or more 4byte words, > together with a one byte "type". > @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with > nuls. Rather, the encode_fh routine should choose a "type" which > indicates the decode_fh how much of the filehandle is valid, and how > it should be interpreted. > + > +Export Operations Flags > +----------------------- > +In addition to the operation vector pointers, struct export_operations also > +contains a "flags" field that allows the filesystem to communicate to nfsd > +that it may want to do things differently when dealing with it. The > +following flags are defined: > + > + EXPORT_OP_NOWCC > + RFC 1813 recommends that servers always send weak cache consistency > + (WCC) data to the client after each operation. The server should > + atomically collect attributes about the inode, do an operation on it, > + and then collect the attributes afterward. This allows the client to > + skip issuing GETATTRs in some situations but means that the server > + is calling vfs_getattr for almost all RPCs. On some filesystems > + (particularly those that are clustered or networked) this is expensive > + and atomicity is difficult to guarantee. This flag indicates to nfsd > + that it should skip providing WCC attributes to the client in NFSv3 > + replies when doing operations on this filesystem. Consider enabling > + this on filesystems that have an expensive ->getattr inode operation, > + or when atomicity between pre and post operation attribute collection > + is impossible to guarantee. > diff --git a/fs/nfs/export.c b/fs/nfs/export.c > index 3430d6891e89..8f4c528865c5 100644 > --- a/fs/nfs/export.c > +++ b/fs/nfs/export.c > @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = { > .encode_fh = nfs_encode_fh, > .fh_to_dentry = nfs_fh_to_dentry, > .get_parent = nfs_get_parent, > + .flags = EXPORT_OP_NOWCC, > }; > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c > index 2277f83da250..480342675292 100644 > --- a/fs/nfsd/nfs3xdr.c > +++ b/fs/nfsd/nfs3xdr.c > @@ -206,7 +206,7 @@ static __be32 * > encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) > { > struct dentry *dentry = fhp->fh_dentry; > - if (dentry && d_really_is_positive(dentry)) { > + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { > __be32 err; > struct kstat stat; > > > > > @@ -261,7 +261,7 @@ void fill_pre_wcc(struct svc_fh *fhp) > struct kstat stat; > __be32 err; > > > > > - if (fhp->fh_pre_saved) > + if (fhp->fh_no_wcc || fhp->fh_pre_saved) > return; > > > > > inode = d_inode(fhp->fh_dentry); > @@ -287,6 +287,9 @@ void fill_post_wcc(struct svc_fh *fhp) > { > __be32 err; > > > > > + if (fhp->fh_no_wcc) > + return; > + > if (fhp->fh_post_saved) > printk("nfsd: inode locked twice during operation.\n"); > > > > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c > index c81dbbad8792..0c2ee65e46f3 100644 > --- a/fs/nfsd/nfsfh.c > +++ b/fs/nfsd/nfsfh.c > @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) > > > > > fhp->fh_dentry = dentry; > fhp->fh_export = exp; > + > + switch (rqstp->rq_vers) { > + case 3: > + if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)) > + break; > + /* Fallthrough */ > + case 2: > + fhp->fh_no_wcc = true; > + } > + > return 0; > out: > exp_put(exp); > @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, > */ > set_version_and_fsid_type(fhp, exp, ref_fh); > > > > > + /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */ > + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false; > + > if (ref_fh == fhp) > fh_put(ref_fh); > > > > > @@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp) > exp_put(exp); > fhp->fh_export = NULL; > } > + fhp->fh_no_wcc = false; > return; > } > > > > > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h > index 56cfbc361561..fb2b60a76b32 100644 > --- a/fs/nfsd/nfsfh.h > +++ b/fs/nfsd/nfsfh.h > @@ -35,6 +35,7 @@ typedef struct svc_fh { > > > > > bool fh_locked; /* inode locked by us */ > bool fh_want_write; /* remount protection taken */ > + bool fh_no_wcc; /* no wcc data needed */ > int fh_flags; /* FH flags */ > #ifdef CONFIG_NFSD_V3 > bool fh_post_saved; /* post-op attrs saved */ > @@ -54,7 +55,6 @@ typedef struct svc_fh { > struct kstat fh_post_attr; /* full attrs after operation */ > u64 fh_post_change; /* nfsv4 change; see above */ > #endif /* CONFIG_NFSD_V3 */ > - > } svc_fh; > #define NFSD4_FH_FOREIGN (1<<0) > #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h > index 3ceb72b67a7a..e7de0103a32e 100644 > --- a/include/linux/exportfs.h > +++ b/include/linux/exportfs.h > @@ -213,6 +213,8 @@ struct export_operations { > bool write, u32 *device_generation); > int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, > int nr_iomaps, struct iattr *iattr); > +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ > + unsigned long flags; > }; > > > > > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx>