If bnxt FW behaves unexpectedly because of FW bug or unexpected behavior it can send completions for old cookies that have already been handled by the bnxt driver. If that old cookie was associated with an old calling context the driver will try to access that caller memory again because the driver never clean the is_waiter_alive flag after the caller successfully complete waiting, and this access will cause the following kernel panic: Call Trace: <IRQ> ? __die+0x20/0x70 ? page_fault_oops+0x75/0x170 ? exc_page_fault+0xaa/0x140 ? asm_exc_page_fault+0x22/0x30 ? bnxt_qplib_process_qp_event.isra.0+0x20c/0x3a0 [bnxt_re] ? srso_return_thunk+0x5/0x5f ? __wake_up_common+0x78/0xa0 ? srso_return_thunk+0x5/0x5f bnxt_qplib_service_creq+0x18d/0x250 [bnxt_re] tasklet_action_common+0xac/0x210 handle_softirqs+0xd3/0x2b0 __irq_exit_rcu+0x9b/0xc0 common_interrupt+0x7f/0xa0 </IRQ> <TASK> To avoid the above unexpected behavior clear the is_waiter_alive flag every time the caller finishes waiting for a completion. Fixes: 691eb7c6110f ("RDMA/bnxt_re: handle command completions after driver detect a timedout") Signed-off-by: Mohammad Heib <mheib@xxxxxxxxxx> --- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c index f5713e3c39fb..eaf92029862b 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c @@ -511,15 +511,15 @@ static int __bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw, else rc = __poll_for_resp(rcfw, cookie); - if (rc) { - spin_lock_irqsave(&rcfw->cmdq.hwq.lock, flags); - crsqe = &rcfw->crsqe_tbl[cookie]; - crsqe->is_waiter_alive = false; - if (rc == -ENODEV) - set_bit(FIRMWARE_STALL_DETECTED, &rcfw->cmdq.flags); - spin_unlock_irqrestore(&rcfw->cmdq.hwq.lock, flags); + + spin_lock_irqsave(&rcfw->cmdq.hwq.lock, flags); + crsqe = &rcfw->crsqe_tbl[cookie]; + crsqe->is_waiter_alive = false; + if (rc == -ENODEV) + set_bit(FIRMWARE_STALL_DETECTED, &rcfw->cmdq.flags); + spin_unlock_irqrestore(&rcfw->cmdq.hwq.lock, flags); + if (rc) return -ETIMEDOUT; - } if (evnt->status) { /* failed with status */ -- 2.34.3