Hello, When I tried to make the NVMeOF initiator driver connect to an NVMeOF target driver over mlx4 in RoCE mode the complaint shown below appeared. Has anyone run into something similar? Thanks, Bart. ====================================================== WARNING: possible circular locking dependency detected 4.16.0-rc2-dbg+ #3 Not tainted ------------------------------------------------------ kworker/u64:1/341 is trying to acquire lock: (&(&iboe->lock)->rlock){+...}, at: [<0000000031cb0a02>] mlx4_ib_post_send+0x2be/0x1500 [mlx4_ib] but task is already holding lock: (&(&qp->sq.lock)->rlock){....}, at: [<0000000024408bcf>] mlx4_ib_post_send+0x5f/0x1500 [mlx4_ib] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&(&qp->sq.lock)->rlock){....}: mlx4_ib_post_send+0x5f/0x1500 [mlx4_ib] ib_send_mad+0x200/0x410 [ib_core] ib_post_send_mad+0x5d/0x7f0 [ib_core] agent_send_response+0xfd/0x1d0 [ib_core] ib_mad_recv_done+0x63d/0x9a0 [ib_core] __ib_process_cq+0x72/0xc0 [ib_core] ib_cq_poll_work+0x1d/0x60 [ib_core] process_one_work+0x210/0x6a0 worker_thread+0x3a/0x390 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 -> #3 (&(&mad_queue->lock)->rlock){....}: ib_send_mad+0x1d4/0x410 [ib_core] ib_post_send_mad+0x5d/0x7f0 [ib_core] mlx4_ib_process_mad+0x2a6/0x480 [mlx4_ib] ib_mad_recv_done+0x2af/0x9a0 [ib_core] __ib_process_cq+0x72/0xc0 [ib_core] ib_cq_poll_work+0x1d/0x60 [ib_core] process_one_work+0x210/0x6a0 worker_thread+0x3a/0x390 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 -> #2 (&(&ibdev->sm_lock)->rlock){-...}: update_sm_ah+0xbf/0x130 [mlx4_ib] handle_port_mgmt_change_event+0x372/0x4b0 [mlx4_ib] mlx4_ib_event+0x513/0x5e0 [mlx4_ib] mlx4_dispatch_event+0x5f/0x90 [mlx4_core] mlx4_eq_int+0x48a/0xca0 [mlx4_core] mlx4_msi_x_interrupt+0xd/0x20 [mlx4_core] __handle_irq_event_percpu+0x41/0x390 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x34/0x60 handle_edge_irq+0x85/0x1a0 handle_irq+0xf2/0x160 do_IRQ+0x63/0x120 ret_from_intr+0x0/0x19 __slab_alloc.isra.74+0x5c/0x80 kmem_cache_alloc_trace+0x290/0x340 mlx4_ib_create_ah+0x41/0x3c0 [mlx4_ib] _rdma_create_ah+0x17/0x40 [ib_core] ib_create_ah_from_wc+0x3e/0x50 [ib_core] agent_send_response+0x7a/0x1d0 [ib_core] ib_mad_recv_done+0x63d/0x9a0 [ib_core] __ib_process_cq+0x72/0xc0 [ib_core] ib_cq_poll_work+0x1d/0x60 [ib_core] process_one_work+0x210/0x6a0 worker_thread+0x3a/0x390 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 -> #1 (&(&priv->ctx_lock)->rlock){-...}: mlx4_get_protocol_dev+0x23/0x90 [mlx4_core] mlx4_ib_netdev_event+0x1af/0x290 [mlx4_ib] register_netdevice_notifier+0xc4/0x1d0 mlx4_ib_add+0x11eb/0x13f0 [mlx4_ib] mlx4_add_device+0x42/0xd0 [mlx4_core] mlx4_register_interface+0x9a/0x110 [mlx4_core] release_port_group+0x52/0x80 [scsi_dh_alua] do_one_initcall+0x3b/0x14e do_init_module+0x5b/0x200 load_module+0x21a2/0x2bc0 SYSC_finit_module+0xb7/0xd0 do_syscall_64+0x63/0x1a0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #0 (&(&iboe->lock)->rlock){+...}: _raw_spin_lock_irqsave+0x38/0x50 mlx4_ib_post_send+0x2be/0x1500 [mlx4_ib] ib_send_mad+0x200/0x410 [ib_core] ib_post_send_mad+0x5d/0x7f0 [ib_core] ib_send_cm_req+0x662/0x890 [ib_cm] rdma_connect+0x1e3/0x560 [rdma_cm] nvme_rdma_cm_handler+0x192/0x820 [nvme_rdma] cma_work_handler+0x3f/0xb0 [rdma_cm] process_one_work+0x210/0x6a0 worker_thread+0x3a/0x390 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 other info that might help us debug this: Chain exists of: &(&iboe->lock)->rlock --> &(&mad_queue->lock)->rlock --> &(&qp->sq.lock)->rlock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&qp->sq.lock)->rlock); lock(&(&mad_queue->lock)->rlock); lock(&(&qp->sq.lock)->rlock); lock(&(&iboe->lock)->rlock); *** DEADLOCK *** 6 locks held by kworker/u64:1/341: #0: ((wq_completion)"rdma_cm"){+.+.}, at: [<000000001b391178>] process_one_work+0x187/0x6a0 #1: ((work_completion)(&work->work)#3){+.+.}, at: [<000000001b391178>] process_one_work+0x187/0x6a0 #2: (&id_priv->handler_mutex){+.+.}, at: [<00000000b1914b97>] cma_work_handler+0x23/0xb0 [rdma_cm] #3: (&(&cm_id_priv->lock)->rlock){....}, at: [<00000000d5bd4b7c>] ib_send_cm_req+0x650/0x890 [ib_cm] #4: (&(&mad_queue->lock)->rlock){....}, at: [<0000000003890d17>] ib_send_mad+0x1d4/0x410 [ib_core] #5: (&(&qp->sq.lock)->rlock){....}, at: [<0000000024408bcf>] mlx4_ib_post_send+0x5f/0x1500 [mlx4_ib] stack backtrace: CPU: 9 PID: 341 Comm: kworker/u64:1 Not tainted 4.16.0-rc2-dbg+ #3 Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016 Workqueue: rdma_cm cma_work_handler [rdma_cm] Call Trace: dump_stack+0x67/0x99 print_circular_bug.isra.35+0x1ce/0x1db __lock_acquire+0x1285/0x1340 lock_acquire+0x99/0x210 _raw_spin_lock_irqsave+0x38/0x50 mlx4_ib_post_send+0x2be/0x1500 [mlx4_ib] ib_send_mad+0x200/0x410 [ib_core] ib_post_send_mad+0x5d/0x7f0 [ib_core] ib_send_cm_req+0x662/0x890 [ib_cm] rdma_connect+0x1e3/0x560 [rdma_cm] nvme_rdma_cm_handler+0x192/0x820 [nvme_rdma] cma_work_handler+0x3f/0xb0 [rdma_cm] process_one_work+0x210/0x6a0 worker_thread+0x3a/0x390 kthread+0x11c/0x140 ret_from_fork+0x24/0x30��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f