> On Jan 29, 2021, at 5:43 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Fri, Jan 29 2021, Chuck Lever wrote: > >> Hi Neil- >> >> I'd like to reduce the amount of page allocation that NFSD does, >> and was wondering about the release and reset of pages in >> svc_xprt_release(). This logic was added when the socket transport >> was converted to use kernel_sendpage() back in 2002. Do you >> remember why releasing the result pages is necessary? >> > > Hi Chuck, > as I recall, kernel_sendpage() (or sock->ops->sendpage() as it was > then) takes a reference to the page and will hold that reference until > the content has been sent and ACKed. nfsd has no way to know when the > ACK comes, so cannot know when the page can be re-used, so it must > release the page and allocate a new one. > > This is the price we pay for zero-copy, and I acknowledge that it is a > real price. I wouldn't be surprised if the trade-offs between > zero-copy and single-copy change over time, and between different > hardware. Very interesting, thanks for the history! Two observations: - I thought without MSG_DONTWAIT, the sendpage operation would be total synchronous -- when the network layer was done with retransmissions, it would unblock the caller. But that's likely a mistaken assumption on my part. That could be why sendmsg is so much slower than sendpage in this particular application. - IIUC, nfsd_splice_read() replaces anonymous pages in rq_pages with actual page cache pages. Those of course cannot be used to construct subsequent RPC Replies, so that introduces a second release requirement. So I have a way to make the first case unnecessary for RPC/RDMA. It has a reliable Send completion mechanism. Sounds like releasing is still necessary for TCP, though; maybe that could be done in the xpo_release_rqst callback. As far as nfsd_splice_read(), I had thought of moving those pages to a separate array which would always be released. That would need to deal with the transport requirements above. If nothing else, I would like to add mention of these requirements somewhere in the code too. What's your opinion? -- Chuck Lever