> On Jan 3, 2019, at 11:00 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2019-01-03 at 17:49 -0500, Chuck Lever wrote: >>> On Jan 3, 2019, at 4:35 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> >>> wrote: >>> >>>> On Jan 3, 2019, at 4:28 PM, Trond Myklebust < >>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>> >>>> On Thu, 2019-01-03 at 16:07 -0500, Chuck Lever wrote: >>>>>> On Jan 3, 2019, at 3:53 PM, Chuck Lever < >>>>>> chuck.lever@xxxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> On Jan 3, 2019, at 1:47 PM, Trond Myklebust < >>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Thu, 2019-01-03 at 13:29 -0500, Chuck Lever wrote: >>>>>>>> + reclen = req->rq_snd_buf.len; >>>>>>>> + marker = cpu_to_be32(RPC_LAST_STREAM_FRAGMENT | >>>>>>>> reclen); >>>>>>>> + return kernel_sendmsg(transport->sock, &msg, &iov, 1, >>>>>>>> iov.iov_len); >>>>>>> >>>>>>> So what does this do for performance? I'd expect that >>>>>>> adding >>>>>>> another >>>>>>> dive into the socket layer will come with penalties. >>>>>> >>>>>> NFSv3 on TCP, sec=sys, 56Gbs IBoIP, v4.20 + my v4.21 patches >>>>>> fio, 8KB random, 70% read, 30% write, 16 threads, iodepth=16 >>>>>> >>>>>> Without this patch: >>>>>> >>>>>> read: IOPS=28.7k, BW=224MiB/s (235MB/s)(11.2GiB/51092msec) >>>>>> write: IOPS=12.3k, BW=96.3MiB/s (101MB/s)(4918MiB/51092msec) >>>>>> >>>>>> With this patch: >>>>>> >>>>>> read: IOPS=28.6k, BW=224MiB/s (235MB/s)(11.2GiB/51276msec) >>>>>> write: IOPS=12.3k, BW=95.8MiB/s (100MB/s)(4914MiB/51276msec) >>>>>> >>>>>> Seems like that's in the noise. >>>>> >>>>> Sigh. That's because it was the same kernel. Again, with >>>>> feeling: >>>>> >>>>> 4.20.0-rc7-00048-g9274254: >>>>> read: IOPS=28.6k, BW=224MiB/s (235MB/s)(11.2GiB/51276msec) >>>>> write: IOPS=12.3k, BW=95.8MiB/s (100MB/s)(4914MiB/51276msec) >>>>> >>>>> 4.20.0-rc7-00049-ga4dea15: >>>>> read: IOPS=27.2k, BW=212MiB/s (223MB/s)(11.2GiB/53979msec) >>>>> write: IOPS=11.7k, BW=91.1MiB/s (95.5MB/s)(4917MiB/53979msec) >>>>> >>>> >>>> So about a 5% reduction in performance? >>> >>> On this workload, yes. >>> >>> Could send the record marker in xs_send_kvec with the head[0] >>> iovec. >>> I'm going to try that next. >> >> That helps: >> >> Linux 4.20.0-rc7-00049-g664f679 #651 SMP Thu Jan 3 17:35:26 EST 2019 >> >> read: IOPS=28.7k, BW=224MiB/s (235MB/s)(11.2GiB/51185msec) >> write: IOPS=12.3k, BW=96.1MiB/s (101MB/s)(4919MiB/51185msec) >> > > Interesting... Perhaps we might be able to eke out a few more percent > performance on file writes by also converting xs_send_pagedata() to use > a single sock_sendmsg() w/ iov_iter rather than looping through several > calls to sendpage()? IMO... For small requests (say, smaller than 17 pages), packing the head, pagevec, and tail into an iov_iter and sending them all via a single sock_sendmsg call would likely be efficient. For larger requests, other overheads would dominate. And you'd have to keep around an iter array that held 257 entries... You could pass a large pagevec to sock_sendmsg in smaller chunks. Are you thinking of converting xs_sendpages (or even xdr_bufs) to use iov_iter directly? -- Chuck Lever