Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 3 Jun 2019 15:08:08 -0400

> On Jun 3, 2019, at 3:05 PM, Dave Wysochanski <dwysocha@xxxxxxxxxx> wrote:
> 
> On Mon, 2019-06-03 at 14:56 -0400, Chuck Lever wrote:
>>> On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <dwysocha@xxxxxxxxxx>
>>> wrote:
>>> 
>>> On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
>>>>> On May 30, 2019, at 6:33 PM, Bruce Fields <bfields@xxxxxxxxxxxx
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>> On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
>>>>>> 
>>>>>> 
>>>>>>> On May 30, 2019, at 5:38 PM, bfields@xxxxxxxxxxxx wrote:
>>>>>>> 
>>>>>>> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski
>>>>>>> wrote:
>>>>>>>> We often see various error conditions with NFS4.x that
>>>>>>>> show
>>>>>>>> up with
>>>>>>>> a very high operation count all completing with tk_status
>>>>>>>> < 0
>>>>>>>> in a
>>>>>>>> short period of time.  Add a count to rpc_iostats to
>>>>>>>> record
>>>>>>>> on a
>>>>>>>> per-op basis the ops that complete in this manner, which
>>>>>>>> will
>>>>>>>> enable lower overhead diagnostics.
>>>>>>> 
>>>>>>> Looks like a good idea to me.
>>>>>>> 
>>>>>>> It's too bad we can't distinguish the errors.  (E.g. ESTALE
>>>>>>> on
>>>>>>> a lookup
>>>>>>> call is a lot more interesting than ENOENT.)  But
>>>>>>> understood
>>>>>>> that
>>>>>>> maintaining (number of possible errors) * (number of
>>>>>>> possible
>>>>>>> ops)
>>>>>>> counters is probably overkill, so just counting the number
>>>>>>> of
>>>>>>> errors
>>>>>>> seems like a good start.
>>>>>> 
>>>>>> We now have trace points that can do that too.
>>>>> 
>>>>> You mean, that can report every error (and its value)?
>>>> 
>>>> Yes, the nfs_xdr_status trace point reports the error by value
>>>> and
>>>> symbolic name.
>>>> 
>>> 
>>> The tracepoint is very useful I agree.  I don't think it will show:
>>> a) the mount
>>> b) the opcode
>>> 
>>> Or am I mistaken and there's a way to get those with a filter or
>>> another tracepoint?
>> 
>> The opcode can be exposed by another trace point, but the link
>> between
>> the two trace points is tenuous and could be improved.
>> 
> 
> I think the number is there but it's not decoded!  This was for a WRITE
> completing with BAD_STATEID - very nice:
> 
>   kworker/u16:0-31130 [006] .... 949028.602298: nfs4_xdr_status: operation 38: nfs status -10025 (BAD_STATEID)

OK, I'm smarter than I look. This shows the NFS operation,
not the RPC procedure (which would be NFS4 COMPOUND), which
is the right thing to do.

>> I don't believe any of the NFS trace points expose the mount. My
>> testing
>> is largely on a single mount so my imagination stopped there.
>> 
> 
> Ok thanks for the confirmation.

Well, it's something worth thinking carefully about: if
you see a slick way to make that information available,
have at it.

>>>>> I assume having these statistics in mountstats is still useful,
>>>>> though.
>>>>> 
>>>>> --b.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> --b.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Signed-off-by: Dave Wysochanski <dwysocha@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> include/linux/sunrpc/metrics.h | 7 ++++++-
>>>>>>>> net/sunrpc/stats.c             | 8 ++++++--
>>>>>>>> 2 files changed, 12 insertions(+), 3 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/include/linux/sunrpc/metrics.h
>>>>>>>> b/include/linux/sunrpc/metrics.h
>>>>>>>> index 1b3751327575..0ee3f7052846 100644
>>>>>>>> --- a/include/linux/sunrpc/metrics.h
>>>>>>>> +++ b/include/linux/sunrpc/metrics.h
>>>>>>>> @@ -30,7 +30,7 @@
>>>>>>>> #include <linux/ktime.h>
>>>>>>>> #include <linux/spinlock.h>
>>>>>>>> 
>>>>>>>> -#define RPC_IOSTATS_VERS	"1.0"
>>>>>>>> +#define RPC_IOSTATS_VERS	"1.1"
>>>>>>>> 
>>>>>>>> struct rpc_iostats {
>>>>>>>> 	spinlock_t		om_lock;
>>>>>>>> @@ -66,6 +66,11 @@ struct rpc_iostats {
>>>>>>>> 	ktime_t			om_queue,	/* queued
>>>>>>>> for
>>>>>>>> xmit */
>>>>>>>> 				om_rtt,		/* RPC
>>>>>>>> RTT */
>>>>>>>> 				om_execute;	/* RPC
>>>>>>>> execution */
>>>>>>>> +	/*
>>>>>>>> +	 * The count of operations that complete with tk_status
>>>>>>>> < 0.
>>>>>>>> +	 * These statuses usually indicate error conditions.
>>>>>>>> +	 */
>>>>>>>> +	unsigned long           om_error_status;
>>>>>>>> } ____cacheline_aligned;
>>>>>>>> 
>>>>>>>> struct rpc_task;
>>>>>>>> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
>>>>>>>> index 8b2d3c58ffae..737414247ca7 100644
>>>>>>>> --- a/net/sunrpc/stats.c
>>>>>>>> +++ b/net/sunrpc/stats.c
>>>>>>>> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const
>>>>>>>> struct rpc_task *task,
>>>>>>>> 
>>>>>>>> 	execute = ktime_sub(now, task->tk_start);
>>>>>>>> 	op_metrics->om_execute = ktime_add(op_metrics-
>>>>>>>>> om_execute, execute);
>>>>>>>> 
>>>>>>>> +	if (task->tk_status < 0)
>>>>>>>> +		op_metrics->om_error_status++;
>>>>>>>> 
>>>>>>>> 	spin_unlock(&op_metrics->om_lock);
>>>>>>>> 
>>>>>>>> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct
>>>>>>>> rpc_iostats *a, struct rpc_iostats *b)
>>>>>>>> 	a->om_queue = ktime_add(a->om_queue, b->om_queue);
>>>>>>>> 	a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
>>>>>>>> 	a->om_execute = ktime_add(a->om_execute, b-
>>>>>>>>> om_execute);
>>>>>>>> 
>>>>>>>> +	a->om_error_status += b->om_error_status;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> static void _print_rpc_iostats(struct seq_file *seq,
>>>>>>>> struct
>>>>>>>> rpc_iostats *stats,
>>>>>>>> 			       int op, const struct
>>>>>>>> rpc_procinfo *procs)
>>>>>>>> {
>>>>>>>> 	_print_name(seq, op, procs);
>>>>>>>> -	seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu
>>>>>>>> %llu\n",
>>>>>>>> +	seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu
>>>>>>>> %lu\n",
>>>>>>>> 		   stats->om_ops,
>>>>>>>> 		   stats->om_ntrans,
>>>>>>>> 		   stats->om_timeouts,
>>>>>>>> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct
>>>>>>>> seq_file *seq, struct rpc_iostats *stats,
>>>>>>>> 		   stats->om_bytes_recv,
>>>>>>>> 		   ktime_to_ms(stats->om_queue),
>>>>>>>> 		   ktime_to_ms(stats->om_rtt),
>>>>>>>> -		   ktime_to_ms(stats->om_execute));
>>>>>>>> +		   ktime_to_ms(stats->om_execute),
>>>>>>>> +		   stats->om_error_status);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> void rpc_clnt_show_stats(struct seq_file *seq, struct
>>>>>>>> rpc_clnt *clnt)
>>>>>>>> -- 
>>>>>>>> 2.20.1
>>>>>> 
>>>>>> --
>>>>>> Chuck Lever
>>>> 
>>>> --
>>>> Chuck Lever
>> 
>> --
>> Chuck Lever

--
Chuck Lever