On Fri, Oct 7, 2022 at 10:34 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > On 10/3/22 08:28, Keith Busch wrote: > > On Mon, Oct 03, 2022 at 01:32:41PM +0000, Shinichiro Kawasaki wrote: > >> > >> BTW, I came up to another question during code read. I found nvme_reset_work() > >> calls nvme_dev_disable() before nvme_sync_queues(). So, I think the NVME > >> controller is already disabled when the reset work calls nvme_sync_queues(). > > > > Right, everything previously outstanding has been reclaimed, and the queues are > > quiesced at this point. There's nothing for timeout work to wait for, and the > > sync is just ensuring every timeout work has returned. > > > > It looks like a timeout is required in order to hit this reported deadlock, but > > the driver ensures there's nothing to timeout prior to syncing the queues. I > > don't think lockdep could reasonably know that, though. > > Hi Keith, > > Commit b2a0eb1a0ac7 ("nvme-pci: Remove watchdog timer") introduced the > nvme_dev_disable() and nvme_reset_ctrl() calls in the nvme_timeout() > function. Has it been considered to invoke these two calls asynchronously > instead of synchronously from the NVMe timeout handler (queue_work())? > Although it may require some work to make sure that this approach does not > trigger any race conditions, do you agree that this should be sufficient to > make lockdep happy? We still have to sync whatever work does the reset, so that would just shift which work the lockdep splat indicates.