> On Oct 13, 2016, at 9:36 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> >> On Oct 13, 2016, at 2:35 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: >> >> On Wed, Oct 12, 2016 at 01:42:26PM -0400, Chuck Lever wrote: >>> I'm studying the way that the ->recvfrom and ->sendto calls work >>> for RPC-over-RDMA. >>> >>> The ->sendto path moves pages out of the svc_rqst before posting >>> I/O (RDMA Write and Send). Once the work is posted, ->sendto >>> returns, and looks like svc_rqst is released at that point. The >>> subsequent completion of the Send then releases those moved pages. >>> >>> I'm wondering if the transport can be simplified: instead of >>> moving pages around, ->sendto could just wait until the Write and >>> Send activity is complete, then return. The upper layer then >>> releases everything. >> >> I'd prefer not block for no reason at all. > > The reason to block is to wait for I/O to complete. Can you > elaborate: for example, are you leery of an extra context > switch? > > Note that in the Read/recvfrom case, RDMA Reads are posted, > and then svc_rdma_recvfrom returns "deferred." When the Reads > complete, the completion handler enqueues the work on the > svc_xprt and svc_rdma_recvfrom is called again. Can someone > explain why svc_rdma_recvfrom doesn't just wait for the > RDMA Reads to complete? > > What is the purpose of pulling the pages out of the svc_rqst > while waiting for the RDMA Reads? > > >>> Another option would be for ->sendto to return a value that means >>> the transport will release the svc_rqst and pages. >> >> Or just let the transport always release it. We only have two >> different implementations of the relevant ops anyway. > > If each transport is responsible for releasing svc_rqst and > pages, code is duplicated. Both transports need to do this > releasing, so it should be done in common code (the svc RPC > code). > > I'd prefer not to have to touch the TCP transport in order > to improve the RDMA transport implementation. However, some > clean-up and refactoring in this area may be unavoidable. svc_send does this: 938 mutex_lock(&xprt->xpt_mutex); 943 len = xprt->xpt_ops->xpo_sendto(rqstp); 944 mutex_unlock(&xprt->xpt_mutex); 945 rpc_wake_up(&xprt->xpt_bc_pending); 946 svc_xprt_release(rqstp); Thus waiting in sendto is indeed not good: it would hold up other replies. But I think you're suggesting that svc_xprt_release() should be invoked by the transport, and not by svc_send(). That could work if there isn't a hidden dependency on the ordering of lines 944-946. The socket transport appears to be able to complete send processing synchronously. Invoking svc_xprt_release() inside the mutex critical section would probably have a negative performance impact. We could add another xpo callout here. For sockets it would invoke svc_xprt_release, and for RDMA it would be a no-op. RDMA would then be responsible for invoking svc_xprt_release in its send completion handler. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html