On Thursday, February 02/07/19, 2019 at 03:49:47 +0530, Steve Wise wrote: > > On 2/6/2019 3:52 PM, Steve Wise wrote: > > On 2/5/2019 4:39 PM, Jason Gunthorpe wrote: > >> On Thu, Jan 31, 2019 at 11:30:42AM -0800, Steve Wise wrote: > >>> While running NVMe/oF wire unplug tests, we hit this warning in > >>> kernel/workqueue.c:check_flush_dependency(): > >>> > >>> WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > >>> (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > >>> "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", > >>> worker->current_pwq->wq->name, worker->current_func, > >>> target_wq->name, target_func); > >>> > >>> Which I think means we're flushing a workq that doesn't have > >>> WQ_MEM_RECLAIM set, from workqueue context that does have it set. > >>> > >>> Looking at rdma_addr_cancel() which is doing the flushing, it flushes > >>> the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() > >>> is being called by the nvme host connection timeout/reconnect workqueue > >>> thread that does have WQ_MEM_RECLAIM set. > >> Since we haven't learned anything more, I think you should look to > >> remove either the WQ_MEM_RECLAIM or the rdma_addr_cancel() from the > >> nvme side. > > The nvme code is just calling rdma_destroy_id(), which in turn calls > > rdma_addr_cancel(), so I'll have to remove WQ_MEM_RECLAIM from the > > workqueue. > > > > I'll post this patch then. > > > What a mess. If I remove RECLAIM for nvme_wq, then I'll regress this > change, I think: > > c669ccdc50c2 ("nvme: queue ns scanning and async request from nvme_wq") > Hi All, I see this issue with some basic tests. Looks like the discussion ended here. Please suggest where this ideally needs to be fixed so that I can try fixing it. Thanks, Bharat