On Mon, Mar 18, 2019 at 09:55:09AM -0700, Dennis Dalessandro wrote: > From: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> > > The work_item cancels that occur when a QP is destroyed > can elicit the following trace: > > [ 708.997199] workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1] > [ 708.997209] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100 > [ 709.227743] Call Trace: > [ 709.230852] __flush_work.isra.29+0x8c/0x1a0 > [ 709.235779] ? __switch_to_asm+0x40/0x70 > [ 709.240335] __cancel_work_timer+0x103/0x190 > [ 709.245253] ? schedule+0x32/0x80 > [ 709.249216] iowait_cancel_work+0x15/0x30 [hfi1] > [ 709.254475] rvt_reset_qp+0x1f8/0x3e0 [rdmavt] > [ 709.259554] rvt_destroy_qp+0x65/0x1f0 [rdmavt] > [ 709.264703] ? _cond_resched+0x15/0x30 > [ 709.269081] ib_destroy_qp+0xe9/0x230 [ib_core] > [ 709.274223] ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib] > [ 709.279799] process_one_work+0x171/0x370 > [ 709.284425] worker_thread+0x49/0x3f0 > [ 709.288695] kthread+0xf8/0x130 > [ 709.292450] ? max_active_store+0x80/0x80 > [ 709.297050] ? kthread_bind+0x10/0x10 > [ 709.301293] ret_from_fork+0x35/0x40 > [ 709.305441] ---[ end trace f0e973737146499b ]--- > > Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM. This seems like the same problem as the nvme patches.. Nobody seems to know what the rules are for using WQ_MEM_RECLAIM. AFAIK it has nothing to do with freeing memory though, that is a new one.. Are you sure cm_tx_reap shouln'd loose its reclaim flag? Jason