> On Jun 16, 2017, at 2:42 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Fri, Jun 16, 2017 at 02:37:40PM -0400, Chuck Lever wrote: >> >>> On Jun 16, 2017, at 1:52 PM, bfields@xxxxxxxxxxxx wrote: >>> >>> Just repeating some comments from the bug: >>> >>> On Fri, Jun 16, 2017 at 11:22:54AM -0400, Chuck Lever wrote: >>>> Running a multi-threaded 8KB fio test (70/30 mix), three or four out >>>> of twelve of the jobs fail when using krb5i. The failure is an EIO >>>> on a read. >>>> >>>> Troubleshooting confirmed the EIO results when the client fails to >>>> verify the MIC of an NFS READ reply. Bruce suggested the problem >>>> could be due to the data payload changing between the time the >>>> reply's MIC was computed on the server and the time the reply was >>>> actually sent. >>>> >>>> krb5p gets around this problem by disabling RQ_SPLICE_OK. >>> >>> And you verified that this does fix the problem in your case. >> >> I've had this applied to my server for a week or so. There >> hasn't been a single recurrence of the issue. >> >> >>> So, I think it's a simple fix and probably the best we can do without a >>> lot more work, so I'm happy applying it. >>> >>> That said, I'm still curious about the performance: >>> >>>> I would say that there is not much difference in this test. >>> >>> We added an extra copy to the read path and it didn't seem to affect >>> throughput of streaming read much--I think that just says memory >>> bandwidth isn't the bottlneck in this case? Which doesn't seem too >>> surprising. >> >> With krb5i, an additional memory copy is minor compared to the >> computation needed. >> >> I'm testing with 56Gbps networking and a tmpfs export. I'm not >> exhausting the CPU on my 4-core server, even with krb5p. The >> effects could be seen in a scalability test, but I don't have >> anything that pushes my server that hard. >> >> >>> I wonder what we should be looking for--maybe running the same test but >>> also measuring CPU usage somehow. >> >> Maybe an increase in latency. But I didn't see much change, and >> the throughput numbers don't reflect any underlying increase in >> per-RPC latency. > > OK! Thanks for looking into this. I just noticed this comment in svc_process_common: 1169 /* Will be turned off only in gss privacy case: */ 1170 set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); That should probably be removed by this patch. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html