Re: [PATCH for-rc] RDMA/bnxt_re: Avoid Hard lockup during error CQE processing

Jason Gunthorpe <jgg@xxxxxxxx> · Tue, 6 Mar 2018 20:18:52 -0700



On Mon, Mar 05, 2018 at 09:49:28PM -0800, Selvin Xavier wrote:
> Hitting the following hardlockup due to a race condition in
> error CQE processing.
> 
> [26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req
> [26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa
> [26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
> [26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
> [26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded
> [26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8,
> [26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [26156.472639] Call Trace:
> [26156.475379]  <NMI>  [<ffffffff98d0d722>] dump_stack+0x19/0x1b
> [26156.481833]  [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140
> [26156.489341]  [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100
> [26156.496256]  [<ffffffff98787c24>] perf_event_overflow+0x14/0x20
> [26156.502887]  [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510
> [26156.509813]  [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50
> [26156.516738]  [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150
> [26156.523273]  [<ffffffff98d17be8>] do_nmi+0x218/0x460
> [26156.528834]  [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e
> [26156.534980]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
> [26156.543268]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
> [26156.551556]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
> [26156.559842]  <EOE>  [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf
> [26156.567555]  [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30
> [26156.573696]  [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re]
> [26156.581789]  [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re]
> [26156.589493]  [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re]
> 
> The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to
> put the error QP in flush list. Since SQ and RQ of each error qp are added
> to two different flush list, we need to protect it using locks of
> corresponding CQs. Difference in order of acquiring the lock in
> SQ poll_cq and RQ poll_cq can cause a hard lockup.
> 
> Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock.
> Instead of this lock, introduces qplib_cq.flush_lock to handle
> addition/deletion of QPs in flush list. Also, always invoke the flush_lock
> in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential
> deadlock.
> 
> Other than the poll_cq context, the movement of QP to/from flush list can
> be done in modify_qp context or from an async error event from HW.
> Synchronize these operations using the bnxt_re verbs layer CQ locks.
> To achieve this, adds a call back to the HW abstraction layer(qplib) to
> bnxt_re ib_verbs layer in case of async error event. Also, removes the
> buddy cq functions as it is no longer required.
> 
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@xxxxxxxxxxxx>
> Signed-off-by: Somnath Kotur <somnath.kotur@xxxxxxxxxxxx>
> Signed-off-by: Devesh Sharma <devesh.sharma@xxxxxxxxxxxx>
> Signed-off-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c   |  11 ++-
>  drivers/infiniband/hw/bnxt_re/ib_verbs.h   |   3 +
>  drivers/infiniband/hw/bnxt_re/main.c       |   7 ++
>  drivers/infiniband/hw/bnxt_re/qplib_fp.c   | 109 +++++++----------------------
>  drivers/infiniband/hw/bnxt_re/qplib_fp.h   |  12 ++++
>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |   3 +-
>  6 files changed, 55 insertions(+), 90 deletions(-)

Applied to for-next

Thanks
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html