> -----Original Message----- > From: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx> > Sent: Thursday, 14 July 2022 03:31 > To: jgg@xxxxxxxx; leon@xxxxxxxxxx; Bernard Metzler <BMT@xxxxxxxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx; chengyou@xxxxxxxxxxxxxxxxx > Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported > IW_CM_EVENT_CONNECT_REPLY event > > If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't > been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY > in this case. This may trigger a call trace in iw_cm. A simple way to > trigger this: Great, thanks! I obviously did never hit an incomplete MPA hdr. Please make another change to fix it correctly, as suggested below. case of an incomplete > server: ib_send_lat > client: ib_send_lat -R <server_ip> > > The call trace looks like this: > > kernel BUG at drivers/infiniband/core/iwcm.c:894! > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > <...> > Workqueue: iw_cm_wq cm_work_handler [iw_cm] > Call Trace: > <TASK> > cm_work_handler+0x1dd/0x370 [iw_cm] > process_one_work+0x1e2/0x3b0 > worker_thread+0x49/0x2e0 > ? rescuer_thread+0x370/0x370 > kthread+0xe5/0x110 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x1f/0x30 > </TASK> > > Signed-off-by: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx> > --- > drivers/infiniband/sw/siw/siw_cm.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/sw/siw/siw_cm.c > b/drivers/infiniband/sw/siw/siw_cm.c > index 17f34d584cd9..f88d2971c2c6 100644 > --- a/drivers/infiniband/sw/siw/siw_cm.c > +++ b/drivers/infiniband/sw/siw/siw_cm.c > @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep) > enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR; > > rv = siw_recv_mpa_rr(cep); > - if (rv != -EAGAIN) > - siw_cancel_mpatimer(cep); > if (rv) > goto out_err; > > + siw_cancel_mpatimer(cep); > + Cancel the MPA timer only if we have a real error. -EAGAIN translates to just further waiting. So best to add the timer cancellation to the error bailout section. > rep = &cep->mpa.hdr; > > if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) { > @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep) > } > > out_err: > - siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL); > + if (rv != -EAGAIN) { cancel MPA timer here. siw_cancel_mpatimer(cep); > + siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL); } > > return rv; > } > -- > 2.37.0