> On Oct 19, 2016, at 1:26 PM, bfields@xxxxxxxxxxxx wrote: > > On Wed, Oct 12, 2016 at 01:42:26PM -0400, Chuck Lever wrote: >> I'm studying the way that the ->recvfrom and ->sendto calls work >> for RPC-over-RDMA. >> >> The ->sendto path moves pages out of the svc_rqst before posting >> I/O (RDMA Write and Send). Once the work is posted, ->sendto >> returns, and looks like svc_rqst is released at that point. The >> subsequent completion of the Send then releases those moved pages. >> >> I'm wondering if the transport can be simplified: instead of >> moving pages around, ->sendto could just wait until the Write and >> Send activity is complete, then return. The upper layer then >> releases everything. > > I don't understand what problem you're trying to fix. Is "moving pages > around" really especially complicated or expensive? Moving pages is complicated and undocumented, and adds host CPU and memory costs (loads and stores). There is also a whole lot of page allocator traffic in this code, and that will be enabling and disabling interrupts and bottom-halves with some frequency. The less involvement by the host CPU the better. The current RDMA transport code documents the RPC-over-RDMA protocol to some extent, but the need for moving the pages is to the reader's imagination. >From my code audit: The svc_rqst is essentially a static piece of memory, and the calling NFSD kthread will re-use it immediately when svc_sendto() returns. It cannot be manipulated asynchronously. The recvfrom calls and the sendto call for one RPC request can each use a different svc_rqst, for example. So if the transport needs to use pages for longer the length of one svc_sendto call (for example, to wait for RDMA Write to complete) then it has to move those pages out of the svc_rqst before it returns. (Let me know if I've got this wrong). I'm falling back to looking at ways to clean up and document the movement of these pages so it is better understood why it is done. There also could be some opportunity to share page movement helpers between the RDMA read and write path (and perhaps the TCP transport too). HTH. > --b. > >> >> Another option would be for ->sendto to return a value that means >> the transport will release the svc_rqst and pages. >> >> Or, the svc_rqst could be reference counted. >> >> Anyone have thoughts about this? -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html