Re: [PATCH v2 26/27] NFSD: Add tracepoints in the NFS dispatcher

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 25, 2020, at 9:59 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> 
> 
> 
>> On Sep 24, 2020, at 7:45 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>> 
>> On Mon, Sep 21, 2020 at 02:13:07PM -0400, Chuck Lever wrote:
>>> This is follow-on work to the tracepoints added in the NFS server's
>>> duplicate reply cache. Here, tracepoints are introduced that report
>>> replies from cache as well as encoding and decoding errors.
>>> 
>>> The NFSv2, v3, and v4 dispatcher requirements have diverged over
>>> time, leaving us with a little bit of technical debt. In addition,
>>> I wanted to add a tracepoint for NFSv2 and NFSv3 similar to the
>>> nfsd4_compound/compoundstatus tracepoints. Lastly, removing some
>>> conditional branches from this hot path helps optimize CPU
>>> utilization. So, I've duplicated the nfsd_dispatcher function.
>> 
>> Comparing current nfsd_dispatch to the nfsv2/v3 nfsd_dispatch: the only
>> thing I spotted removed from the v2/v3-specific dispatch is the
>> rq_lease_breaker = NULL.  (I think that's not correct, actually.  We
>> could remove the need for that to be set in the v2/v3 case, but with the
>> current code it does need to be set.)
> 
> Noted with thanks.
> 
> 
>> Comparing current nfsd_dispatch to the nfsv4 nfsd4_dispatch, the
>> v4-specific dispatch does away with nfs_request_too_big() and the
>> v2-specific shortcut in the error encoding case.
>> 
>> So these still look *very* similar.  I don't feel like we're getting a
>> lot of benefit out of splitting these out.
> 
> I don't disagree with that at all. At this point I'm just noodling
> to see what's possible. I'm now toying with other ways to add high-
> value tracing in the legacy ULPs. In the end I might end up avoiding
> significant changes in the dispatchers in order to add tracing.
> 
> However, a few thoughts I had while learning how the dispatcher
> code works.
> 
> There are some opportunities for reducing instruction path length
> and the number of conditional branches in here. It's a hot path,
> so I think we should consider some careful micro-optimizations
> even if they don't add significant new features or do add some
> code duplication.
> 
> In user space, the library (iirc) assumes each ULP provides it's
> own dispatcher function. I'd consider duplicating and removing
> svc_generic_dispatcher() to simplify the pasta in svc_process(),
> again as a micro-optimization and for better code legibility.
> 
> lockd's pc_func returns an RPC accept_stat, but the NFSD pc_func
> methods return an NFS status. The latter feels like a layering
> violation for the sake of reducing a small amount of code
> duplication. I'd rather see encoding of the NFS status handled in
> the NFS Reply encoders, since that is an XDR function, and because
> that logic seems slightly different for NFSv2, support for which
> we'd like to deprecate at some point.

I misremembered this. nfsd_dispatch simply doesn't set *statp
in the success case, which seems strange.


> Note also that *statp in nfsd_dispatch is never explicitly set to
> rpc_success in the normal execution flow. It relies on the
> equivalence of rpc_success and nfs_ok, which is convenient, but
> confusing to read. It might be cleaner if *statp was made an enum
> to make it explicit what set of values go in that return variable.
> 
> 
>> By the way, the combination of copying a bunch of code, doing some
>> common cleanup, and dropping version-specific stuff makes it a little
>> hard to sort out what's going on.  If it were me, I'd do this as:
>> 
>> 	- one patch to do common cleanup (dropping some redundant
>> 	  comments, using argv/resv variables to cleanup calculations,
>> 	  etc.)
>> 	- a second patch that just duplicates the one nfsd_dispatch into
>> 	  nfsd_dispatch and nfsd4_dispatch
>> 	- a third patch that just removes the stuff we don't need from
>> 	  the newly duplicated dispatchers.
>> 
>> and then it'd be obvious what's changed.
> 
> Good points. The series is still immature, and I tend to maintain
> larger checkpoint patches that get broken down over time to make
> it more obvious what is changing and why. I'll keep your comments
> in mind as this work evolves. Feel free to make similar suggestions
> about my future work.
> 
> 
>> --b.
>> 
>>> 
>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>> ---
>>> fs/nfsd/nfs2acl.c  |    2 -
>>> fs/nfsd/nfs3acl.c  |    2 -
>>> fs/nfsd/nfs4proc.c |    2 -
>>> fs/nfsd/nfsd.h     |    1 
>>> fs/nfsd/nfssvc.c   |  198 ++++++++++++++++++++++++++++++++++------------------
>>> fs/nfsd/trace.h    |   50 +++++++++++++
>>> 6 files changed, 183 insertions(+), 72 deletions(-)
>>> 
>>> diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
>>> index 54e597918822..894b8f0594e2 100644
>>> --- a/fs/nfsd/nfs2acl.c
>>> +++ b/fs/nfsd/nfs2acl.c
>>> @@ -416,6 +416,6 @@ const struct svc_version nfsd_acl_version2 = {
>>> 	.vs_nproc	= 5,
>>> 	.vs_proc	= nfsd_acl_procedures2,
>>> 	.vs_count	= nfsd_acl_count2,
>>> -	.vs_dispatch	= nfsd_dispatch,
>>> +	.vs_dispatch	= nfsd4_dispatch,
>>> 	.vs_xdrsize	= NFS3_SVC_XDRSIZE,
>>> };
>>> diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
>>> index 7f512dec7460..924ebb19c59c 100644
>>> --- a/fs/nfsd/nfs3acl.c
>>> +++ b/fs/nfsd/nfs3acl.c
>>> @@ -282,7 +282,7 @@ const struct svc_version nfsd_acl_version3 = {
>>> 	.vs_nproc	= 3,
>>> 	.vs_proc	= nfsd_acl_procedures3,
>>> 	.vs_count	= nfsd_acl_count3,
>>> -	.vs_dispatch	= nfsd_dispatch,
>>> +	.vs_dispatch	= nfsd4_dispatch,
>>> 	.vs_xdrsize	= NFS3_SVC_XDRSIZE,
>>> };
>>> 
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index 49109645af24..61302641f651 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -3320,7 +3320,7 @@ const struct svc_version nfsd_version4 = {
>>> 	.vs_nproc		= 2,
>>> 	.vs_proc		= nfsd_procedures4,
>>> 	.vs_count		= nfsd_count3,
>>> -	.vs_dispatch		= nfsd_dispatch,
>>> +	.vs_dispatch		= nfsd4_dispatch,
>>> 	.vs_xdrsize		= NFS4_SVC_XDRSIZE,
>>> 	.vs_rpcb_optnl		= true,
>>> 	.vs_need_cong_ctrl	= true,
>>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
>>> index cb742e17e04a..7fa4b19dd2f7 100644
>>> --- a/fs/nfsd/nfsd.h
>>> +++ b/fs/nfsd/nfsd.h
>>> @@ -78,6 +78,7 @@ extern const struct seq_operations nfs_exports_op;
>>> */
>>> int		nfsd_svc(int nrservs, struct net *net, const struct cred *cred);
>>> int		nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp);
>>> +int		nfsd4_dispatch(struct svc_rqst *rqstp, __be32 *statp);
>>> 
>>> int		nfsd_nrthreads(struct net *);
>>> int		nfsd_nrpools(struct net *);
>>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>>> index f7f6473578af..d626eea1c78a 100644
>>> --- a/fs/nfsd/nfssvc.c
>>> +++ b/fs/nfsd/nfssvc.c
>>> @@ -28,6 +28,7 @@
>>> #include "vfs.h"
>>> #include "netns.h"
>>> #include "filecache.h"
>>> +#include "trace.h"
>>> 
>>> #define NFSDDBG_FACILITY	NFSDDBG_SVC
>>> 
>>> @@ -964,7 +965,7 @@ static __be32 map_new_errors(u32 vers, __be32 nfserr)
>>> {
>>> 	if (nfserr == nfserr_jukebox && vers == 2)
>>> 		return nfserr_dropit;
>>> -	if (nfserr == nfserr_wrongsec && vers < 4)
>>> +	if (nfserr == nfserr_wrongsec)
>>> 		return nfserr_acces;
>>> 	return nfserr;
>>> }
>>> @@ -980,18 +981,6 @@ static __be32 map_new_errors(u32 vers, __be32 nfserr)
>>> static bool nfs_request_too_big(struct svc_rqst *rqstp,
>>> 				const struct svc_procedure *proc)
>>> {
>>> -	/*
>>> -	 * The ACL code has more careful bounds-checking and is not
>>> -	 * susceptible to this problem:
>>> -	 */
>>> -	if (rqstp->rq_prog != NFS_PROGRAM)
>>> -		return false;
>>> -	/*
>>> -	 * Ditto NFSv4 (which can in theory have argument and reply both
>>> -	 * more than a page):
>>> -	 */
>>> -	if (rqstp->rq_vers >= 4)
>>> -		return false;
>>> 	/* The reply will be small, we're OK: */
>>> 	if (proc->pc_xdrressize > 0 &&
>>> 	    proc->pc_xdrressize < XDR_QUADLEN(PAGE_SIZE))
>>> @@ -1000,81 +989,152 @@ static bool nfs_request_too_big(struct svc_rqst *rqstp,
>>> 	return rqstp->rq_arg.len > PAGE_SIZE;
>>> }
>>> 
>>> -int
>>> -nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp)
>>> +/**
>>> + * nfsd_dispatch - Process an NFSv2 or NFSv3 request
>>> + * @rqstp: incoming NFS request
>>> + * @statp: OUT: RPC accept_stat value
>>> + *
>>> + * Return values:
>>> + *  %0: Processing complete; do not send a Reply
>>> + *  %1: Processing complete; send a Reply
>>> + */
>>> +int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp)
>>> {
>>> -	const struct svc_procedure *proc;
>>> -	__be32			nfserr;
>>> -	__be32			*nfserrp;
>>> -
>>> -	dprintk("nfsd_dispatch: vers %d proc %d\n",
>>> -				rqstp->rq_vers, rqstp->rq_proc);
>>> -	proc = rqstp->rq_procinfo;
>>> -
>>> -	if (nfs_request_too_big(rqstp, proc)) {
>>> -		dprintk("nfsd: NFSv%d argument too large\n", rqstp->rq_vers);
>>> -		*statp = rpc_garbage_args;
>>> -		return 1;
>>> +	const struct svc_procedure *proc = rqstp->rq_procinfo;
>>> +	struct kvec *argv = &rqstp->rq_arg.head[0];
>>> +	struct kvec *resv = &rqstp->rq_res.head[0];
>>> +	__be32 nfserr, *nfserrp;
>>> +
>>> +	if (nfs_request_too_big(rqstp, proc))
>>> +		goto out_too_large;
>>> +
>>> +	if (proc->pc_decode && !procp->pc_decode(rqstp, argv->iov_base))
>>> +		goto out_decode_err;
>>> +
>>> +	rqstp->rq_cachetype = proc->pc_cachetype;
>>> +	switch (nfsd_cache_lookup(rqstp)) {
>>> +	case RC_DROPIT:
>>> +		goto out_dropit;
>>> +	case RC_REPLY:
>>> +		goto out_cached_reply;
>>> +	case RC_DOIT:
>>> +		break;
>>> 	}
>>> -	rqstp->rq_lease_breaker = NULL;
>>> +
>>> +	nfserrp = resv->iov_base + resv->iov_len;
>>> +	resv->iov_len += sizeof(__be32);
>>> +	nfserr = proc->pc_func(rqstp);
>>> +	nfserr = map_new_errors(rqstp->rq_vers, nfserr);
>>> +	if (nfserr == nfserr_dropit || test_bit(RQ_DROPME, &rqstp->rq_flags))
>>> +		goto out_update_drop;
>>> +	if (rqstp->rq_proc)
>>> +		*nfserrp++ = nfserr;
>>> +
>>> +	/* For NFSv2, additional info is never returned in case of an error. */
>>> +	if (!(nfserr && rqstp->rq_vers == 2))
>>> +		if (proc->pc_encode && !proc->pc_encode(rqstp, nfserrp))
>>> +			goto out_encode_err;
>>> +
>>> +	nfsd_cache_update(rqstp, proc->pc_cachetype, statp + 1);
>>> +	trace_nfsd_svc_status(rqstp, proc, nfserr);
>>> +	return 1;
>>> +
>>> +out_too_large:
>>> +	trace_nfsd_svc_too_large_err(rqstp);
>>> +	*statp = rpc_garbage_args;
>>> +	return 1;
>>> +
>>> +out_decode_err:
>>> +	trace_nfsd_svc_decode_err(rqstp);
>>> +	*statp = rpc_garbage_args;
>>> +	return 1;
>>> +
>>> +out_update_drop:
>>> +	nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
>>> +out_dropit:
>>> +	trace_nfsd_svc_dropit(rqstp);
>>> +	return 0;
>>> +
>>> +out_cached_reply:
>>> +	trace_nfsd_svc_cached_reply(rqstp);
>>> +	return 1;
>>> +
>>> +out_encode_err:
>>> +	trace_nfsd_svc_encode_err(rqstp);
>>> +	nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
>>> +	*statp = rpc_system_err;
>>> +	return 1;
>>> +}
>>> +
>>> +/**
>>> + * nfsd4_dispatch - Process an NFSv4 or NFSACL request
>>> + * @rqstp: incoming NFS request
>>> + * @statp: OUT: RPC accept_stat value
>>> + *
>>> + * Return values:
>>> + *  %0: Processing complete; do not send a Reply
>>> + *  %1: Processing complete; send a Reply
>>> + */
>>> +int nfsd4_dispatch(struct svc_rqst *rqstp, __be32 *statp)
>>> +{
>>> +	const struct svc_procedure *proc = rqstp->rq_procinfo;
>>> +	struct kvec *argv = &rqstp->rq_arg.head[0];
>>> +	struct kvec *resv = &rqstp->rq_res.head[0];
>>> +	__be32 nfserr, *nfserrp;
>>> +
>>> 	/*
>>> 	 * Give the xdr decoder a chance to change this if it wants
>>> 	 * (necessary in the NFSv4.0 compound case)
>>> 	 */
>>> 	rqstp->rq_cachetype = proc->pc_cachetype;
>>> -	/* Decode arguments */
>>> -	if (proc->pc_decode &&
>>> -	    !proc->pc_decode(rqstp, (__be32*)rqstp->rq_arg.head[0].iov_base)) {
>>> -		dprintk("nfsd: failed to decode arguments!\n");
>>> -		*statp = rpc_garbage_args;
>>> -		return 1;
>>> -	}
>>> +	rqstp->rq_lease_breaker = NULL;
>>> +
>>> +	if (proc->pc_decode && !procp->pc_decode(rqstp, argv->iov_base))
>>> +		goto out_decode_err;
>>> 
>>> -	/* Check whether we have this call in the cache. */
>>> 	switch (nfsd_cache_lookup(rqstp)) {
>>> 	case RC_DROPIT:
>>> -		return 0;
>>> +		goto out_dropit;
>>> 	case RC_REPLY:
>>> -		return 1;
>>> -	case RC_DOIT:;
>>> -		/* do it */
>>> +		goto out_cached_reply;
>>> +	case RC_DOIT:
>>> +		break;
>>> 	}
>>> 
>>> -	/* need to grab the location to store the status, as
>>> -	 * nfsv4 does some encoding while processing 
>>> -	 */
>>> -	nfserrp = rqstp->rq_res.head[0].iov_base
>>> -		+ rqstp->rq_res.head[0].iov_len;
>>> -	rqstp->rq_res.head[0].iov_len += sizeof(__be32);
>>> -
>>> -	/* Now call the procedure handler, and encode NFS status. */
>>> +	nfserrp = resv->iov_base + resv->iov_len;
>>> +	resv->iov_len += sizeof(__be32);
>>> 	nfserr = proc->pc_func(rqstp);
>>> -	nfserr = map_new_errors(rqstp->rq_vers, nfserr);
>>> -	if (nfserr == nfserr_dropit || test_bit(RQ_DROPME, &rqstp->rq_flags)) {
>>> -		dprintk("nfsd: Dropping request; may be revisited later\n");
>>> -		nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
>>> -		return 0;
>>> -	}
>>> -
>>> -	if (rqstp->rq_proc != 0)
>>> +	if (test_bit(RQ_DROPME, &rqstp->rq_flags))
>>> +		goto out_update_drop;
>>> +	if (rqstp->rq_proc)
>>> 		*nfserrp++ = nfserr;
>>> 
>>> -	/* Encode result.
>>> -	 * For NFSv2, additional info is never returned in case of an error.
>>> -	 */
>>> -	if (!(nfserr && rqstp->rq_vers == 2)) {
>>> -		if (proc->pc_encode && !proc->pc_encode(rqstp, nfserrp)) {
>>> -			/* Failed to encode result. Release cache entry */
>>> -			dprintk("nfsd: failed to encode result!\n");
>>> -			nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
>>> -			*statp = rpc_system_err;
>>> -			return 1;
>>> -		}
>>> -	}
>>> +	if (proc->pc_encode && !proc->pc_encode(rqstp, nfserrp))
>>> +		goto out_encode_err;
>>> 
>>> -	/* Store reply in cache. */
>>> 	nfsd_cache_update(rqstp, rqstp->rq_cachetype, statp + 1);
>>> 	return 1;
>>> +
>>> +out_decode_err:
>>> +	trace_nfsd_svc_decode_err(rqstp);
>>> +	*statp = rpc_garbage_args;
>>> +	return 1;
>>> +
>>> +out_update_drop:
>>> +	nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
>>> +out_dropit:
>>> +	trace_nfsd_svc_dropit(rqstp);
>>> +	return 0;
>>> +
>>> +out_cached_reply:
>>> +	trace_nfsd_svc_cached_reply(rqstp);
>>> +	return 1;
>>> +
>>> +out_encode_err:
>>> +	trace_nfsd_svc_encode_err(rqstp);
>>> +	nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
>>> +	*statp = rpc_system_err;
>>> +	return 1;
>>> }
>>> 
>>> int nfsd_pool_stats_open(struct inode *inode, struct file *file)
>>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
>>> index 53115fbc00b2..50ab4a84c25f 100644
>>> --- a/fs/nfsd/trace.h
>>> +++ b/fs/nfsd/trace.h
>>> @@ -32,6 +32,56 @@
>>> 		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
>>> 		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
>>> 
>>> +DECLARE_EVENT_CLASS(nfsd_simple_class,
>>> +	TP_PROTO(
>>> +		const struct svc_rqst *rqstp
>>> +	),
>>> +	TP_ARGS(rqstp),
>>> +	TP_STRUCT__entry(
>>> +		__field(u32, xid)
>>> +	),
>>> +	TP_fast_assign(
>>> +		__entry->xid = be32_to_cpu(rqstp->rq_xid);
>>> +	),
>>> +	TP_printk("xid=0x%08x", __entry->xid)
>>> +);
>>> +
>>> +#define DEFINE_NFSD_SIMPLE_EVENT(name)			\
>>> +DEFINE_EVENT(nfsd_simple_class, nfsd_##name,		\
>>> +	TP_PROTO(const struct svc_rqst *rqstp),		\
>>> +	TP_ARGS(rqstp))
>>> +
>>> +DEFINE_NFSD_SIMPLE_EVENT(svc_too_large_err);
>>> +DEFINE_NFSD_SIMPLE_EVENT(svc_decode_err);
>>> +DEFINE_NFSD_SIMPLE_EVENT(svc_dropit);
>>> +DEFINE_NFSD_SIMPLE_EVENT(svc_cached_reply);
>>> +DEFINE_NFSD_SIMPLE_EVENT(svc_encode_err);
>>> +
>>> +TRACE_EVENT(nfsd_svc_status,
>>> +	TP_PROTO(
>>> +		const struct svc_rqst *rqstp,
>>> +		const struct svc_procedure *proc,
>>> +		__be32 status
>>> +	),
>>> +	TP_ARGS(rqstp, proc, status),
>>> +	TP_STRUCT__entry(
>>> +		__field(u32, xid)
>>> +		__field(u32, version)
>>> +		__field(unsigned long, status)
>>> +		__string(procedure, rqstp->rq_procinfo->pc_name)
>>> +	),
>>> +	TP_fast_assign(
>>> +		__entry->xid = be32_to_cpu(rqstp->rq_xid);
>>> +		__entry->version = rqstp->rq_vers;
>>> +		__entry->status = be32_to_cpu(status);
>>> +		__assign_str(procedure, rqstp->rq_procinfo->pc_name);
>>> +	),
>>> +	TP_printk("xid=0x%08x version=%u procedure=%s status=%s",
>>> +		__entry->xid, __entry->version, __get_str(procedure),
>>> +		show_nfs_status(__entry->status)
>>> +	)
>>> +);
>>> +
>>> TRACE_EVENT(nfsd_access,
>>> 	TP_PROTO(
>>> 		const struct svc_rqst *rqstp,
>>> 
> 
> --
> Chuck Lever

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux