On 1/19/23 19:38, Zhu Yanjun wrote: > On Fri, Jan 20, 2023 at 3:09 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: >> >> Currently the rxe driver, in rare situations, can respond incorrectly >> to zero length operations which are retried. The client does not >> have to provide an rkey for zero length RDMA operations so the rkey >> may be invalid. The driver saves this rkey in the responder resources >> to replay the rdma operation if a retry is required so the second pass >> will use this (potentially) invalid rkey which may result in memory >> faults. >> >> This patch corrects the driver to ignore the provided rkey if the >> reth length is zero and make sure to set the mr to NULL. In read_reply() >> if the length is zero the MR is set to NULL. Warnings are added in >> the routines in rxe_mr.c to catch NULL MRs when the length is non-zero. >> > > There is a patch in the following link: > > https://patchwork.kernel.org/project/linux-rdma/patch/20230113023527.728725-1-baijiaju1990@xxxxxxxxx/ > > Not sure whether it is similar or not. > > Zhu Yanjun > >> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> >> --- >> drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++++++ >> drivers/infiniband/sw/rxe/rxe_resp.c | 36 +++++++++++++++++++++------- >> 2 files changed, 36 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >> index 072eac4b65d2..134a74f315c2 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c >> @@ -267,6 +267,9 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length) >> int m, n; >> void *addr; >> >> + if (WARN_ON(!mr)) >> + return NULL; >> + >> if (mr->state != RXE_MR_STATE_VALID) { >> rxe_dbg_mr(mr, "Not in valid state\n"); >> addr = NULL; >> @@ -305,6 +308,9 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length) >> if (length == 0) >> return 0; >> >> + if (WARN_ON(!mr)) >> + return -EINVAL; >> + >> if (mr->ibmr.type == IB_MR_TYPE_DMA) >> return -EFAULT; >> >> @@ -349,6 +355,9 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, >> if (length == 0) >> return 0; >> >> + if (WARN_ON(!mr)) >> + return -EINVAL; >> + >> if (mr->ibmr.type == IB_MR_TYPE_DMA) { >> u8 *src, *dest; >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c >> index c74972244f08..a528dc25d389 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_resp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c >> @@ -457,13 +457,23 @@ static enum resp_states rxe_resp_check_length(struct rxe_qp *qp, >> return RESPST_CHK_RKEY; >> } >> >> +/* if the reth length field is zero we can assume nothing >> + * about the rkey value and should not validate or use it. >> + * Instead set qp->resp.rkey to 0 which is an invalid rkey >> + * value since the minimum index part is 1. >> + */ >> static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt) >> { >> + unsigned int length = reth_len(pkt); >> + >> qp->resp.va = reth_va(pkt); >> qp->resp.offset = 0; >> - qp->resp.rkey = reth_rkey(pkt); >> - qp->resp.resid = reth_len(pkt); >> - qp->resp.length = reth_len(pkt); >> + qp->resp.resid = length; >> + qp->resp.length = length; >> + if (length) >> + qp->resp.rkey = reth_rkey(pkt); >> + else >> + qp->resp.rkey = 0; >> } >> >> static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt) >> @@ -512,8 +522,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp, >> >> /* A zero-byte op is not required to set an addr or rkey. See C9-88 */ >> if ((pkt->mask & RXE_READ_OR_WRITE_MASK) && >> - (pkt->mask & RXE_RETH_MASK) && >> - reth_len(pkt) == 0) { >> + (pkt->mask & RXE_RETH_MASK) && reth_len(pkt) == 0) { >> + qp->resp.mr = NULL; >> return RESPST_EXECUTE; >> } >> >> @@ -592,6 +602,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, >> return RESPST_EXECUTE; >> >> err: >> + qp->resp.mr = NULL; >> if (mr) >> rxe_put(mr); >> if (mw) >> @@ -966,7 +977,10 @@ static enum resp_states read_reply(struct rxe_qp *qp, >> } >> >> if (res->state == rdatm_res_state_new) { >> - if (!res->replay) { >> + if (qp->resp.length == 0) { >> + mr = NULL; >> + qp->resp.mr = NULL; >> + } else if (!res->replay) { >> mr = qp->resp.mr; >> qp->resp.mr = NULL; >> } else { >> @@ -980,9 +994,13 @@ static enum resp_states read_reply(struct rxe_qp *qp, >> else >> opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST; >> } else { >> - mr = rxe_recheck_mr(qp, res->read.rkey); >> - if (!mr) >> - return RESPST_ERR_RKEY_VIOLATION; >> + if (qp->resp.length == 0) { >> + mr = NULL; >> + } else { >> + mr = rxe_recheck_mr(qp, res->read.rkey); >> + if (!mr) >> + return RESPST_ERR_RKEY_VIOLATION; >> + } >> >> if (res->read.resid > mtu) >> opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE; >> -- >> 2.37.2 >> Zhu, It relates since he is checking for NULL MRs. But I don't think it addresses the root causes. The patch I sent should eliminate NULL MRs together with length != 0 in the copy routines. I added WARN_ON's in case someone changes things later and we hit this again. (A warning is more useful than a fault which can be very hard to diagnose.) The two changes I made that attack the cause of problems are (1) clearing qp->resp.mr in check_rkey() in the alternate paths. The primary path demands that it get set with a valid mr. But on the alternate paths it isn't set at all and can leave with a stale, invalid or wrong mr value. (2) in read_reply() there is an error path where a zero length read fails to get acked and the requester retries the operation and sends a second request. This will end up in read_reply and as currently written attempt to lookup the rkey and turn it into an MR but no valid rkey is required in a zero length operation so this is likely to fail. The fixes treats length == 0 as a special case and force a NULL mr. This should not trigger a fault in the mr copy/etc. routines since they always check for length == 0 and return or require a non zero length. Thanks, Bob