Is there an asynchronous version of ib_dma_sync? Because it flushes
DMA pipelines, I'm wondering if kicking it off early might improve
latency, getting it done before svc_rdma_recvfrom() needs to dig
into the contents.
Tom.
On 1/21/2021 3:53 PM, Chuck Lever wrote:
The Receive completion handler doesn't look at the contents of the
Receive buffer. The DMA sync isn't terribly expensive but it's one
less thing that needs to be done by the Receive completion handler,
which is single-threaded (per svc_xprt). This helps scalability.
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
---
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ab0b7e9777bc..6d28f23ceb35 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -342,9 +342,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
/* All wc fields are now known to be valid */
ctxt->rc_byte_len = wc->byte_len;
- ib_dma_sync_single_for_cpu(rdma->sc_pd->device,
- ctxt->rc_recv_sge.addr,
- wc->byte_len, DMA_FROM_DEVICE);
spin_lock(&rdma->sc_rq_dto_lock);
list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
@@ -851,6 +848,9 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
spin_unlock(&rdma_xprt->sc_rq_dto_lock);
percpu_counter_inc(&svcrdma_stat_recv);
+ ib_dma_sync_single_for_cpu(rdma_xprt->sc_pd->device,
+ ctxt->rc_recv_sge.addr, ctxt->rc_byte_len,
+ DMA_FROM_DEVICE);
svc_rdma_build_arg_xdr(rqstp, ctxt);
/* Prevent svc_xprt_release from releasing pages in rq_pages