> -----Original Message----- > From: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx> > Sent: Thursday, 14 July 2022 15:20 > To: Bernard Metzler <BMT@xxxxxxxxxxxxxx>; jgg@xxxxxxxx; leon@xxxxxxxxxx > Cc: linux-rdma@xxxxxxxxxxxxxxx > Subject: [EXTERNAL] Re: [PATCH for-next] RDMA/siw: Fix duplicated reported > IW_CM_EVENT_CONNECT_REPLY event > > > > On 7/14/22 8:59 PM, Bernard Metzler wrote: > >> -----Original Message----- > >> From: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx> > >> Sent: Thursday, 14 July 2022 03:31 > >> To: jgg@xxxxxxxx; leon@xxxxxxxxxx; Bernard Metzler <BMT@xxxxxxxxxxxxxx> > >> Cc: linux-rdma@xxxxxxxxxxxxxxx; chengyou@xxxxxxxxxxxxxxxxx > >> Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported > >> IW_CM_EVENT_CONNECT_REPLY event > >> > >> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't > >> been received completely, and should not report > IW_CM_EVENT_CONNECT_REPLY > >> in this case. This may trigger a call trace in iw_cm. A simple way to > >> trigger this: > > > > Great, thanks! I obviously did never hit an incomplete > > MPA hdr. Please make another change to fix it correctly, > > as suggested below. > > > > > > case of an incomplete > >> server: ib_send_lat > >> client: ib_send_lat -R <server_ip> > >> > >> The call trace looks like this: > >> > >> kernel BUG at drivers/infiniband/core/iwcm.c:894! > >> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > >> <...> > >> Workqueue: iw_cm_wq cm_work_handler [iw_cm] > >> Call Trace: > >> <TASK> > >> cm_work_handler+0x1dd/0x370 [iw_cm] > >> process_one_work+0x1e2/0x3b0 > >> worker_thread+0x49/0x2e0 > >> ? rescuer_thread+0x370/0x370 > >> kthread+0xe5/0x110 > >> ? kthread_complete_and_exit+0x20/0x20 > >> ret_from_fork+0x1f/0x30 > >> </TASK> > >> > >> Signed-off-by: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx> > >> --- > >> drivers/infiniband/sw/siw/siw_cm.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/infiniband/sw/siw/siw_cm.c > >> b/drivers/infiniband/sw/siw/siw_cm.c > >> index 17f34d584cd9..f88d2971c2c6 100644 > >> --- a/drivers/infiniband/sw/siw/siw_cm.c > >> +++ b/drivers/infiniband/sw/siw/siw_cm.c > >> @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep) > >> enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR; > >> > >> rv = siw_recv_mpa_rr(cep); > >> - if (rv != -EAGAIN) > >> - siw_cancel_mpatimer(cep); > >> if (rv) > >> goto out_err; > >> > >> + siw_cancel_mpatimer(cep); > >> + > > > > Cancel the MPA timer only if we have a > > real error. -EAGAIN translates to just > > further waiting. So best to add the timer > > cancellation to the error bailout section. > > > >> rep = &cep->mpa.hdr; > >> > >> if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) { > >> @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep) > >> } > >> > >> out_err: > >> - siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL); > >> + if (rv != -EAGAIN) > > { > > cancel MPA timer here. > > Indeed we do not need it here, because when siw_proc_mpareply returns error > but not -EAGAIN, the release_cep will be set in the caller > (siw_cm_work_handler), > and siw_cancel_mpatimer will be called in the error handle flow. > > I think this is better, because the error handle is more unified. Yes, sorry, your original patch is correct. > > How do you think? > > Thanks, > Cheng Xu > > > > siw_cancel_mpatimer(cep); > >> + siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL); > > } > >> > >> return rv; > >> } > >> -- > >> 2.37.0 > > Thank you! Acked-by: Bernard Metzler <bmt@xxxxxxxxxxxxxx>