On 10/4/22 19:44, Shinichiro Kawasaki wrote: > On Oct 03, 2022 / 09:28, Keith Busch wrote: >> On Mon, Oct 03, 2022 at 01:32:41PM +0000, Shinichiro Kawasaki wrote: >>> >>> BTW, I came up to another question during code read. I found nvme_reset_work() >>> calls nvme_dev_disable() before nvme_sync_queues(). So, I think the NVME >>> controller is already disabled when the reset work calls nvme_sync_queues(). >> >> Right, everything previously outstanding has been reclaimed, and the queues are >> quiesced at this point. There's nothing for timeout work to wait for, and the >> sync is just ensuring every timeout work has returned. > > Thank you Keith, good to know that we do not need to worry about the deadlock > scenario with nvme_reset_work() :) I also checked nvme_reset_prepare() and > nvme_suspend(). They also call nvme_dev_disable() or nvme_start/wait_freeze() > before nvme_sync_queues(). So I think the queues are quiesced in these context > also, and do not worry them either. > >> >> It looks like a timeout is required in order to hit this reported deadlock, but >> the driver ensures there's nothing to timeout prior to syncing the queues. I >> don't think lockdep could reasonably know that, though. > > I see... From blktests point of view, I would like to suppress the lockdep splat > which triggers the block/011 failure. I created a quick patch using > lockdep_off() and lockdep_on() which skips lockdep check for the read lock in > nvme_sync_io_queues() [1] and confirmed it avoids the lockdep splat. I think > this lockdep check relax is fine because nvme_sync_io_queues() callers do call > nvme_dev_disable(), nvme_start/wait_freeze() or nvme_stop_queues(). The queues > are quiesced at nvme_sync_io_queues() and the deadlock scenario does not happen. > > Any comment on this patch will be appreciated. If this action approach is ok, > I'll post as a formal patch for review. > > [1] > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 66446f1e06c..9d091327a88 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -5121,10 +5121,14 @@ void nvme_sync_io_queues(struct nvme_ctrl *ctrl) > { > struct nvme_ns *ns; > I think there should be a comment here explaining why your turn off lockdep. > + lockdep_off(); > down_read(&ctrl->namespaces_rwsem); > + lockdep_on(); > list_for_each_entry(ns, &ctrl->namespaces, list) > blk_sync_queue(ns->queue); > + lockdep_off(); > up_read(&ctrl->namespaces_rwsem); > + lockdep_on(); > } > EXPORT_SYMBOL_GPL(nvme_sync_io_queues); > -- Damien Le Moal Western Digital Research