Re: [PATCH v2 19/19] sunrpc: Disable splice for krb5i

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 16 Jun 2017 14:44:04 -0400

> On Jun 16, 2017, at 2:42 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> 
> On Fri, Jun 16, 2017 at 02:37:40PM -0400, Chuck Lever wrote:
>> 
>>> On Jun 16, 2017, at 1:52 PM, bfields@xxxxxxxxxxxx wrote:
>>> 
>>> Just repeating some comments from the bug:
>>> 
>>> On Fri, Jun 16, 2017 at 11:22:54AM -0400, Chuck Lever wrote:
>>>> Running a multi-threaded 8KB fio test (70/30 mix), three or four out
>>>> of twelve of the jobs fail when using krb5i. The failure is an EIO
>>>> on a read.
>>>> 
>>>> Troubleshooting confirmed the EIO results when the client fails to
>>>> verify the MIC of an NFS READ reply. Bruce suggested the problem
>>>> could be due to the data payload changing between the time the
>>>> reply's MIC was computed on the server and the time the reply was
>>>> actually sent.
>>>> 
>>>> krb5p gets around this problem by disabling RQ_SPLICE_OK.
>>> 
>>> And you verified that this does fix the problem in your case.
>> 
>> I've had this applied to my server for a week or so. There
>> hasn't been a single recurrence of the issue.
>> 
>> 
>>> So, I think it's a simple fix and probably the best we can do without a
>>> lot more work, so I'm happy applying it.
>>> 
>>> That said, I'm still curious about the performance:
>>> 
>>>> I would say that there is not much difference in this test.
>>> 
>>> We added an extra copy to the read path and it didn't seem to affect
>>> throughput of streaming read much--I think that just says memory
>>> bandwidth isn't the bottlneck in this case?  Which doesn't seem too
>>> surprising.
>> 
>> With krb5i, an additional memory copy is minor compared to the
>> computation needed.
>> 
>> I'm testing with 56Gbps networking and a tmpfs export. I'm not
>> exhausting the CPU on my 4-core server, even with krb5p. The
>> effects could be seen in a scalability test, but I don't have
>> anything that pushes my server that hard.
>> 
>> 
>>> I wonder what we should be looking for--maybe running the same test but
>>> also measuring CPU usage somehow.
>> 
>> Maybe an increase in latency. But I didn't see much change, and
>> the throughput numbers don't reflect any underlying increase in
>> per-RPC latency.
> 
> OK!  Thanks for looking into this.

I just noticed this comment in svc_process_common:

1169         /* Will be turned off only in gss privacy case: */
1170         set_bit(RQ_SPLICE_OK, &rqstp->rq_flags);

That should probably be removed by this patch.

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html