Re: [HPDD-discuss] [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 12, 2015 at 06:24:54AM -0400, Jeff Layton wrote:
> On Sat, 12 Sep 2015 04:41:33 +0000
> "Dilger, Andreas" <andreas.dilger@xxxxxxxxx> wrote:
> 
> > On 2015/09/11, 4:20 AM, "HPDD-discuss on behalf of Jeff Layton"
> > <hpdd-discuss-bounces@xxxxxxxxxxxx on behalf of jlayton@xxxxxxxxxxxxxxx>
> > wrote:
> > 
> > >With NFSv3 nfsd will always attempt to send along WCC data to the
> > >client. This generally involves saving off the in-core inode information
> > >prior to doing the operation on the given filehandle, and then issuing a
> > >vfs_getattr to it after the op.
> > >
> > >Some filesystems (particularly clustered or networked ones) have an
> > >expensive ->getattr inode operation. Atomicitiy is also often difficult
> > >or impossible to guarantee on such filesystems. For those, we're best
> > >off not trying to provide WCC information to the client at all, and to
> > >simply allow it to poll for that information as needed with a GETATTR
> > >RPC.
> > >
> > >This patch adds a new flags field to struct export_operations, and
> > >defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> > >that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> > >also adds a blurb about the new flags field and flag to the exporting
> > >documentation.
> > >
> > >The server will also now skip collecting this information for NFSv2 as
> > >well, since that info is never used there anyway.
> > >
> > >Note that this patch does not add this flag to any filesystem
> > >export_operations structures. This was originally developed to allow
> > >reexporting nfs via nfsd. That code is not (and may never be) suitable
> > >for merging into mainline.
> > >
> > >Other filesystems may want to consider enabling this flag too. It's hard
> > >to tell however which ones have export operations to enable export via
> > >knfsd and which ones mostly rely on them for open-by-filehandle support,
> > >so I'm leaving that up to the individual maintainers to decide. I am
> > >cc'ing the relevant lists for those filesystems that I think may want to
> > >consider adding this though.
> > >
> > >Cc: HPDD-discuss@xxxxxxxxxxxx
> > >Cc: ceph-devel@xxxxxxxxxxxxxxx
> > >Cc: cluster-devel@xxxxxxxxxx
> > >Cc: fuse-devel@xxxxxxxxxxxxxxxxxxxxx
> > >Cc: ocfs2-devel@xxxxxxxxxxxxxx
> > >Signed-off-by: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
> > >---
> > > Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++++++++++++++
> > > fs/nfsd/nfs3xdr.c                       |  5 ++++-
> > > fs/nfsd/nfsfh.c                         | 14 ++++++++++++++
> > > fs/nfsd/nfsfh.h                         |  5 ++++-
> > > include/linux/exportfs.h                |  2 ++
> > > 5 files changed, 51 insertions(+), 2 deletions(-)
> > >
> > >diff --git a/Documentation/filesystems/nfs/Exporting
> > >b/Documentation/filesystems/nfs/Exporting
> > >index 520a4becb75c..fa636cde3907 100644
> > >--- a/Documentation/filesystems/nfs/Exporting
> > >+++ b/Documentation/filesystems/nfs/Exporting
> > >@@ -138,6 +138,11 @@ struct which has the following members:
> > >     to find potential names, and matches inode numbers to find the
> > >correct
> > >     match.
> > > 
> > >+  flags
> > >+    Some filesystems may need to be handled differently than others. The
> > >+    export_operations struct also includes a flags field that allows the
> > >+    filesystem to communicate such information to nfsd. See the Export
> > >+    Operations Flags section below for more explanation.
> > > 
> > > A filehandle fragment consists of an array of 1 or more 4byte words,
> > > together with a one byte "type".
> > >@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have
> > >been padded with
> > > nuls.  Rather, the encode_fh routine should choose a "type" which
> > > indicates the decode_fh how much of the filehandle is valid, and how
> > > it should be interpreted.
> > >+
> > >+Export Operations Flags
> > >+-----------------------
> > >+In addition to the operation vector pointers, struct export_operations
> > >also
> > >+contains a "flags" field that allows the filesystem to communicate to
> > >nfsd
> > >+that it may want to do things differently when dealing with it. The
> > >+following flags are defined:
> > >+
> > >+  EXPORT_OP_NOWCC
> > >+    RFC 1813 recommends that servers always send weak cache consistency
> > >+    (WCC) data to the client after each operation. The server should
> > >+    atomically collect attributes about the inode, do an operation on it,
> > >+    and then collect the attributes afterward. This allows the client to
> > >+    skip issuing GETATTRs in some situations but means that the server
> > >+    is calling vfs_getattr for almost all RPCs. On some filesystems
> > >+    (particularly those that are clustered or networked) this is
> > >expensive
> > >+    and atomicity is difficult to guarantee. This flag indicates to nfsd
> > >+    that it should skip providing WCC attributes to the client in NFSv3
> > >+    replies when doing operations on this filesystem. Consider enabling
> > >+    this on filesystems that have an expensive ->getattr inode operation,
> > >+    or when atomicity between pre and post operation attribute collection
> > >+    is impossible to guarantee.
> > >diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > >index 01dcd494f781..c30c8c604e2a 100644
> > >--- a/fs/nfsd/nfs3xdr.c
> > >+++ b/fs/nfsd/nfs3xdr.c
> > >@@ -203,7 +203,7 @@ static __be32 *
> > > encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh
> > >*fhp)
> > > {
> > > 	struct dentry *dentry = fhp->fh_dentry;
> > >-	if (dentry && d_really_is_positive(dentry)) {
> > >+	if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
> > > 	        __be32 err;
> > > 		struct kstat stat;
> > > 
> > >@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> > > {
> > > 	__be32 err;
> > > 
> > >+	if (fhp->fh_no_wcc)
> > >+		return;
> > >+
> > > 	if (fhp->fh_post_saved)
> > > 		printk("nfsd: inode locked twice during operation.\n");
> > > 
> > >diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > >index 350041a40fe5..29ae37f62b9b 100644
> > >--- a/fs/nfsd/nfsfh.c
> > >+++ b/fs/nfsd/nfsfh.c
> > >@@ -267,6 +267,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst
> > >*rqstp, struct svc_fh *fhp)
> > > 
> > > 	fhp->fh_dentry = dentry;
> > > 	fhp->fh_export = exp;
> > >+
> > >+	switch (rqstp->rq_vers) {
> > >+	case 3:
> > >+		if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
> > >+			break;
> > >+		/* Fallthrough */
> > >+	case 2:
> > >+		fhp->fh_no_wcc = true;
> > >+	}
> > >+
> > > 	return 0;
> > > out:
> > > 	exp_put(exp);
> > >@@ -535,6 +545,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export
> > >*exp, struct dentry *dentry,
> > > 	 */
> > > 	 set_version_and_fsid_type(fhp, exp, ref_fh);
> > > 
> > >+	/* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
> > >+	fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
> > >+
> > > 	if (ref_fh == fhp)
> > > 		fh_put(ref_fh);
> > > 
> > >@@ -641,6 +654,7 @@ fh_put(struct svc_fh *fhp)
> > > 		exp_put(exp);
> > > 		fhp->fh_export = NULL;
> > > 	}
> > >+	fhp->fh_no_wcc = false;
> > > 	return;
> > > }
> > > 
> > >diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > >index 1e90dad4926b..9ddead4d98f8 100644
> > >--- a/fs/nfsd/nfsfh.h
> > >+++ b/fs/nfsd/nfsfh.h
> > >@@ -32,6 +32,7 @@ typedef struct svc_fh {
> > > 
> > > 	unsigned char		fh_locked;	/* inode locked by us */
> > > 	unsigned char		fh_want_write;	/* remount protection taken */
> > >+	bool			fh_no_wcc;	/* no wcc data needed */
> > 
> > This increases the size of svc_fh because it splits the four unsigned
> > chars.
> > You could change all of these (fh_locked, fh_want_write,
> > fh_{pre,post}saved)
> > to be bools to avoid that and make it more clear they are only used as
> > booleans (I verified that they all are only assigned 0 or 1).
> > 
> 
> I don't think it matters, at least not on x86_64. bools and chars both
> require a byte. pahole does show this adding a new hole, but that's
> just because this brings the code up to 5 flags and the next field
> (fh_pre_size) needs to be aligned.
> 
> I do agree that replacing those other unsigned chars with bools is more
> clear however. Maybe we should even replace them all with a single
> unsigned int and use bitops to set flags in there. That would be more
> space efficient now that we're at 5 flags.

Makes sense to me.--b.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux