On 2/5/2019 4:39 PM, Jason Gunthorpe wrote: > On Thu, Jan 31, 2019 at 11:30:42AM -0800, Steve Wise wrote: >> While running NVMe/oF wire unplug tests, we hit this warning in >> kernel/workqueue.c:check_flush_dependency(): >> >> WARN_ONCE(worker && ((worker->current_pwq->wq->flags & >> (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), >> "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", >> worker->current_pwq->wq->name, worker->current_func, >> target_wq->name, target_func); >> >> Which I think means we're flushing a workq that doesn't have >> WQ_MEM_RECLAIM set, from workqueue context that does have it set. >> >> Looking at rdma_addr_cancel() which is doing the flushing, it flushes >> the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() >> is being called by the nvme host connection timeout/reconnect workqueue >> thread that does have WQ_MEM_RECLAIM set. > Since we haven't learned anything more, I think you should look to > remove either the WQ_MEM_RECLAIM or the rdma_addr_cancel() from the > nvme side. The nvme code is just calling rdma_destroy_id(), which in turn calls rdma_addr_cancel(), so I'll have to remove WQ_MEM_RECLAIM from the workqueue. I'll post this patch then.