Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Nov 20, 2018, at 4:01 PM, Anna Schumaker <schumaker.anna@xxxxxxxxx> wrote:
> 
> On Tue, 2018-11-20 at 13:07 -0500, Chuck Lever wrote:
>>> On Nov 20, 2018, at 1:02 PM, Anna Schumaker <schumaker.anna@xxxxxxxxx>
>>> wrote:
>>> 
>>> Hi Chuck,
>>> 
>>> On Mon, 2018-11-19 at 10:46 -0500, Chuck Lever wrote:
>>>> Place the associated RPC transaction's XID in the upper 32 bits of
>>>> each RDMA segment's rdma_offset field. These bits are currently
>>>> always zero.
>>>> 
>>>> There are two reasons to do this:
>>>> 
>>>> - The R_key only has 8 bits that are different from registration to
>>>> registration. The XID adds more uniqueness to each RDMA segment to
>>>> reduce the likelihood of a software bug on the server reading from
>>>> or writing into memory it's not supposed to.
>>>> 
>>>> - On-the-wire RDMA Read and Write operations do not otherwise carry
>>>> any identifier that matches them up to an RPC. The XID in the
>>>> upper 32 bits will act as an eye-catcher in network captures.
>>>> 
>>>> Suggested-by: Tom Talpey <ttalpey@xxxxxxxxxxxxx>
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> ---
>>>> net/sunrpc/xprtrdma/frwr_ops.c  |    3 ++-
>>>> net/sunrpc/xprtrdma/rpc_rdma.c  |    6 +++---
>>>> net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
>>>> 3 files changed, 6 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c
>>>> b/net/sunrpc/xprtrdma/frwr_ops.c
>>>> index 49b314d..3b260d2 100644
>>>> --- a/net/sunrpc/xprtrdma/frwr_ops.c
>>>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
>>>> @@ -344,7 +344,7 @@
>>>> */
>>>> static struct rpcrdma_mr_seg *
>>>> frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>>>> -	    int nsegs, bool writing, struct rpcrdma_mr **out)
>>>> +	    int nsegs, bool writing, u32 xid, struct rpcrdma_mr **out)
>>>> {
>>>> 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>> 	bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS;
>>>> @@ -398,6 +398,7 @@
>>>> 	if (unlikely(n != mr->mr_nents))
>>>> 		goto out_mapmr_err;
>>>> 
>>>> +	ibmr->iova |= ((u64)cpu_to_be32(xid)) << 32;
>>> 
>>> My mount command hangs once we make this change (I got bored and killed it
>>> after
>>> about 5 minutes).  This is with NFS over soft-RoCE in a kvm virtual machine,
>>> and
>>> I see the behavior with all NFS versions.
>>> 
>>> I hope this helps!
>> 
>> OK. Can you capture ftrace output to show us where the hang occurs?
>> I use this on the NFS client:
>> 
>> # trace-cmd record -e sunrpc -e rpcrdma
>> ^C
>> # trace-cmd report | less      (or > /tmp/trace.log)
> 
> The trace is attached!  I started the mount and walked away for a while.  When I
> came back, it reported "mount.nfs: Connection timed out".

The first operation is EXCHANGE_ID, and the client expects a
large Reply so it registers a Reply chunk.

       mount.nfs-1325  [001]   450.674910: xprtrdma_reply_chunk: task:16530@10 mr=0xffff93643876cb00 4380@0xffffd37d38bec100:0x00001a61 (last)
       mount.nfs-1325  [001]   450.674912: xprtrdma_marshal:     task:16530@10 xid=0x09c9d05d: hdr=48 xdr=232/0/0 inline/reply chunk
       mount.nfs-1325  [001]   450.674914: xprtrdma_post_send:   req=0xffff936438a80000, 2 SGEs, status=0
       mount.nfs-1325  [001]   450.674928: xprt_transmit:        peer=[192.168.100.215]:20049 xid=0x09c9d05d status=0
       mount.nfs-1325  [001]   450.674929: rpc_task_run_action:  task:16530@10 flags=5a80 state=0015 status=0 action=call_transmit_status
       mount.nfs-1325  [001]   450.674930: rpc_task_sleep:       task:16530@10 flags=5a80 state=0015 status=0 timeout=18000 queue=xprt_pending
    kworker/1:1H-177   [001]   450.676094: xprtrdma_wc_receive:  cqe=0xffff936438d66600 0 bytes: WR_FLUSH_ERR (5/0x0)
    kworker/1:1H-177   [001]   450.676095: xprtrdma_wc_receive:  cqe=0xffff936438d66400 0 bytes: WR_FLUSH_ERR (5/0x0)

The XID is 09c9d0f3, but I don't see it in the offset, which
is ffffd37d38bec100. Are you sure the patch is applied?


> I hope this helps!
> Anna
> 
>> 
>> Thanks!
>> 
>> 
>>> Anna
>>> 
>>>> 	key = (u8)(ibmr->rkey & 0x000000FF);
>>>> 	ib_update_fast_reg_key(ibmr, ++key);
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> index 9f53e02..89a2db2 100644
>>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> @@ -357,7 +357,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt
>>>> *r_xprt,
>>>> 
>>>> 	do {
>>>> 		seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs,
>>>> -						   false, &mr);
>>>> +						   false, rqst->rq_xid, &mr);
>>>> 		if (IS_ERR(seg))
>>>> 			return PTR_ERR(seg);
>>>> 		rpcrdma_mr_push(mr, &req->rl_registered);
>>>> @@ -415,7 +415,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt
>>>> *r_xprt,
>>>> 	nchunks = 0;
>>>> 	do {
>>>> 		seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs,
>>>> -						   true, &mr);
>>>> +						   true, rqst->rq_xid, &mr);
>>>> 		if (IS_ERR(seg))
>>>> 			return PTR_ERR(seg);
>>>> 		rpcrdma_mr_push(mr, &req->rl_registered);
>>>> @@ -473,7 +473,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt
>>>> *r_xprt,
>>>> 	nchunks = 0;
>>>> 	do {
>>>> 		seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs,
>>>> -						   true, &mr);
>>>> +						   true, rqst->rq_xid, &mr);
>>>> 		if (IS_ERR(seg))
>>>> 			return PTR_ERR(seg);
>>>> 		rpcrdma_mr_push(mr, &req->rl_registered);
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> index a13ccb6..2ae1ee2 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -472,7 +472,7 @@ struct rpcrdma_memreg_ops {
>>>> 	struct rpcrdma_mr_seg *
>>>> 			(*ro_map)(struct rpcrdma_xprt *,
>>>> 				  struct rpcrdma_mr_seg *, int, bool,
>>>> -				  struct rpcrdma_mr **);
>>>> +				  u32, struct rpcrdma_mr **);
>>>> 	int		(*ro_send)(struct rpcrdma_ia *ia,
>>>> 				   struct rpcrdma_req *req);
>>>> 	void		(*ro_reminv)(struct rpcrdma_rep *rep,
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> <trace.log>

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux