On 10/23/22 21:25, Zhu Yanjun wrote: > On Mon, Oct 24, 2022 at 2:05 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: >> >> On 10/21/22 20:09, Li Zhijian wrote: >>> >>> >>> On 21/10/2022 22:39, Zhu Yanjun wrote: >>>> On Fri, Oct 21, 2022 at 3:53 PM Li Zhijian <lizhijian@xxxxxxxxxxx> wrote: >>>>> Before the testing, we already passed it to rxe_mr_copy() where mr could >>>>> be dereferenced. so this checking is not exactly correct. >>>>> >>>>> I tried to figure out the details how/when mr could be NULL, but failed >>>>> at last. Add a WARN_ON(!mr) to that path to tell us more when it >>>>> happends. >>>> If I get you correctly, you confronted a problem, >>> Not exactly, I removed the mr checking since i think this checking is not correct. >>> the newly added WARN_ON(!mr) is the only once place where the mr can be NULL but not handled correctly. >>> At least with/without this patch, once WARN_ON(!mr) is triggered, kernel will go something wrong. >>> >>> so i want to place this WARN_ON(!mr) to point to the problem. >>> >>> Thanks >>> Zhijian >>> >>>> but you can not figure it out. >>>> So you send it upstream as a patch? >>>> >>>> I am not sure if it is a good idea. >>>> >>>> Zhu Yanjun >>>> >>>>> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx> >>>>> --- >>>>> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++-- >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c >>>>> index ed5a09e86417..218c14fb07c6 100644 >>>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c >>>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c >>>>> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp, >>>>> if (res->state == rdatm_res_state_new) { >>>>> if (!res->replay) { >>>>> mr = qp->resp.mr; >>>>> + WARN_ON(!mr); >>>>> qp->resp.mr = NULL; >>>>> } else { >>>>> mr = rxe_recheck_mr(qp, res->read.rkey); >>>>> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp, >>>>> >>>>> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), >>>>> payload, RXE_FROM_MR_OBJ); >>>>> - if (mr) >>>>> - rxe_put(mr); >>>>> + rxe_put(mr); >>>>> >>>>> if (bth_pad(&ack_pkt)) { >>>>> u8 *pad = payload_addr(&ack_pkt) + payload >>>>> -- >>>>> 2.31.1 >>>>> >>> >> >> Li is correct that the only way mr could be NULL is if qp->resp.mr == NULL. So the > > What I am concerned about is if "WARN_ON(!mr);" should be added or not. > IMO, if the root cause remains unclear, this should be a problem. > Currently this problem is not fixed. It is useless to send a debug > statement to the maillist. Li was fixing a bug that no one ever saw. mr is not NULL in this case. Bob > > Zhu Yanjun > >> 'if (mr)' is not needed if that is the case. The read_reply subroutine is reached >> from a new rdma read operation after going through check_rkey or from a previous >> rdma read operations from get_req if qp->resp.res != NULL or from a duplicate request >> where the previous responder resource is found. In all these cases the mr is set. >> Initially in check_rkey where if it can't find the mr it causes an RKEY_VIOLATION. >> Thereafter the rkey is stored in the responder resources and looked up for each >> packet to get an mr or cause an RKEY_VIOLATION. So the mr can't be NULL. I think >> you can leave out the WARN and just drop the if (mr). >> >> Bob >>