On 30/08/2022 17:44, Li Zhijian wrote:
On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
On 29/08/2022 13:44, Daisuke Matsuda wrote:
An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
---
drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b36ec5c4d5e0..4b3e8aec2fb8 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
payload, RXE_FROM_MR_OBJ);
Looks i was missing the the faulting point inside rxe_mr_copy()
mr_check_range() is the only point to return an error inside rxe_mr_copy()
and mr_check_range() would never fail in this moment, since it is always tested by RESPST_CHK_RKEY
before calling read_reply()
so it's safe to remove this print, and add some comments ?
Thanks
Zhijian
- if (err)
- pr_err("Failed copying memory\n");
Not relate to this patch.
I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
IMO, when err happens, responder shall notify the request anyhow.
Practically, I have never seen rxe_mr_copy() failed before,
but I agree the implementation may be incorrect as you mentioned.
As far as I tested, responder replied with the requested amount of payloads
even when rxe_mr_copy() is modified to fail. In this case,
requester may mistakenly believe that they get data correctly.
For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).
it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
by returning RESPST_ERR_RKEY_VIOLATION here.
see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"
Daisuke Matsuda
Thanks
Zhijian
if (mr)
rxe_put(mr);
@@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
}
err = rxe_xmit_packet(qp, &ack_pkt, skb);
- if (err) {
- pr_err("Failed sending RDMA reply.\n");
+ if (err)
return RESPST_ERR_RNR;
- }
res->read.va += payload;
res->read.resid -= payload;