On Fri, Jan 20, 2023 at 12:27 PM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > On 1/19/23 19:38, Zhu Yanjun wrote: > > On Fri, Jan 20, 2023 at 3:09 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > >> > >> Currently the rxe driver, in rare situations, can respond incorrectly > >> to zero length operations which are retried. The client does not > >> have to provide an rkey for zero length RDMA operations so the rkey > >> may be invalid. The driver saves this rkey in the responder resources > >> to replay the rdma operation if a retry is required so the second pass > >> will use this (potentially) invalid rkey which may result in memory > >> faults. In this link: https://lore.kernel.org/lkml/TYCPR01MB8455FC418FD61CAEE85D0D9FE5C19@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m9ea28d1465dc2fb3469c21659e6b6c7349fc984f Daisuke Matsuda (Fujitsu) made further investigations about this problem. And Daisuke Matsuda (Fujitsu) has delved into this problem. Let us wait for his comments. Zhu Yanjun > >> > >> This patch corrects the driver to ignore the provided rkey if the > >> reth length is zero and make sure to set the mr to NULL. In read_reply() > >> if the length is zero the MR is set to NULL. Warnings are added in > >> the routines in rxe_mr.c to catch NULL MRs when the length is non-zero. > >> > > > > There is a patch in the following link: > > > > https://patchwork.kernel.org/project/linux-rdma/patch/20230113023527.728725-1-baijiaju1990@xxxxxxxxx/ > > > > Not sure whether it is similar or not. > > > > Zhu Yanjun > > > >> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > >> --- > >> drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++++++ > >> drivers/infiniband/sw/rxe/rxe_resp.c | 36 +++++++++++++++++++++------- > >> 2 files changed, 36 insertions(+), 9 deletions(-) > >> > >> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > >> index 072eac4b65d2..134a74f315c2 100644 > >> --- a/drivers/infiniband/sw/rxe/rxe_mr.c > >> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > >> @@ -267,6 +267,9 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length) > >> int m, n; > >> void *addr; > >> > >> + if (WARN_ON(!mr)) > >> + return NULL; > >> + > >> if (mr->state != RXE_MR_STATE_VALID) { > >> rxe_dbg_mr(mr, "Not in valid state\n"); > >> addr = NULL; > >> @@ -305,6 +308,9 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length) > >> if (length == 0) > >> return 0; > >> > >> + if (WARN_ON(!mr)) > >> + return -EINVAL; > >> + > >> if (mr->ibmr.type == IB_MR_TYPE_DMA) > >> return -EFAULT; > >> > >> @@ -349,6 +355,9 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, > >> if (length == 0) > >> return 0; > >> > >> + if (WARN_ON(!mr)) > >> + return -EINVAL; > >> + > >> if (mr->ibmr.type == IB_MR_TYPE_DMA) { > >> u8 *src, *dest; > >> > >> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > >> index c74972244f08..a528dc25d389 100644 > >> --- a/drivers/infiniband/sw/rxe/rxe_resp.c > >> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > >> @@ -457,13 +457,23 @@ static enum resp_states rxe_resp_check_length(struct rxe_qp *qp, > >> return RESPST_CHK_RKEY; > >> } > >> > >> +/* if the reth length field is zero we can assume nothing > >> + * about the rkey value and should not validate or use it. > >> + * Instead set qp->resp.rkey to 0 which is an invalid rkey > >> + * value since the minimum index part is 1. > >> + */ > >> static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt) > >> { > >> + unsigned int length = reth_len(pkt); > >> + > >> qp->resp.va = reth_va(pkt); > >> qp->resp.offset = 0; > >> - qp->resp.rkey = reth_rkey(pkt); > >> - qp->resp.resid = reth_len(pkt); > >> - qp->resp.length = reth_len(pkt); > >> + qp->resp.resid = length; > >> + qp->resp.length = length; > >> + if (length) > >> + qp->resp.rkey = reth_rkey(pkt); > >> + else > >> + qp->resp.rkey = 0; > >> } > >> > >> static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt) > >> @@ -512,8 +522,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp, > >> > >> /* A zero-byte op is not required to set an addr or rkey. See C9-88 */ > >> if ((pkt->mask & RXE_READ_OR_WRITE_MASK) && > >> - (pkt->mask & RXE_RETH_MASK) && > >> - reth_len(pkt) == 0) { > >> + (pkt->mask & RXE_RETH_MASK) && reth_len(pkt) == 0) { > >> + qp->resp.mr = NULL; > >> return RESPST_EXECUTE; > >> } > >> > >> @@ -592,6 +602,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, > >> return RESPST_EXECUTE; > >> > >> err: > >> + qp->resp.mr = NULL; > >> if (mr) > >> rxe_put(mr); > >> if (mw) > >> @@ -966,7 +977,10 @@ static enum resp_states read_reply(struct rxe_qp *qp, > >> } > >> > >> if (res->state == rdatm_res_state_new) { > >> - if (!res->replay) { > >> + if (qp->resp.length == 0) { > >> + mr = NULL; > >> + qp->resp.mr = NULL; > >> + } else if (!res->replay) { > >> mr = qp->resp.mr; > >> qp->resp.mr = NULL; > >> } else { > >> @@ -980,9 +994,13 @@ static enum resp_states read_reply(struct rxe_qp *qp, > >> else > >> opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST; > >> } else { > >> - mr = rxe_recheck_mr(qp, res->read.rkey); > >> - if (!mr) > >> - return RESPST_ERR_RKEY_VIOLATION; > >> + if (qp->resp.length == 0) { > >> + mr = NULL; > >> + } else { > >> + mr = rxe_recheck_mr(qp, res->read.rkey); > >> + if (!mr) > >> + return RESPST_ERR_RKEY_VIOLATION; > >> + } > >> > >> if (res->read.resid > mtu) > >> opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE; > >> -- > >> 2.37.2 > >> > > Zhu, > > It relates since he is checking for NULL MRs. But I don't think it addresses the root > causes. The patch I sent should eliminate NULL MRs together with length != 0 in > the copy routines. I added WARN_ON's in case someone changes things later and > we hit this again. (A warning is more useful than a fault which can be very hard > to diagnose.) > > The two changes I made that attack the cause of problems are > (1) clearing qp->resp.mr in check_rkey() in the alternate paths. The primary > path demands that it get set with a valid mr. But on the alternate paths it isn't > set at all and can leave with a stale, invalid or wrong mr value. > (2) in read_reply() there is an error path where a zero length read fails to get > acked and the requester retries the operation and sends a second request. This > will end up in read_reply and as currently written attempt to lookup the rkey and > turn it into an MR but no valid rkey is required in a zero length operation so this > is likely to fail. The fixes treats length == 0 as a special case and force a NULL mr. > This should not trigger a fault in the mr copy/etc. routines since they always > check for length == 0 and return or require a non zero length. > > Thanks, > > Bob > >