On 10/3/22 08:28, Keith Busch wrote:
On Mon, Oct 03, 2022 at 01:32:41PM +0000, Shinichiro Kawasaki wrote:
BTW, I came up to another question during code read. I found nvme_reset_work()
calls nvme_dev_disable() before nvme_sync_queues(). So, I think the NVME
controller is already disabled when the reset work calls nvme_sync_queues().
Right, everything previously outstanding has been reclaimed, and the queues are
quiesced at this point. There's nothing for timeout work to wait for, and the
sync is just ensuring every timeout work has returned.
It looks like a timeout is required in order to hit this reported deadlock, but
the driver ensures there's nothing to timeout prior to syncing the queues. I
don't think lockdep could reasonably know that, though.
Hi Keith,
Commit b2a0eb1a0ac7 ("nvme-pci: Remove watchdog timer") introduced the
nvme_dev_disable() and nvme_reset_ctrl() calls in the nvme_timeout()
function. Has it been considered to invoke these two calls asynchronously
instead of synchronously from the NVMe timeout handler (queue_work())?
Although it may require some work to make sure that this approach does not
trigger any race conditions, do you agree that this should be sufficient to
make lockdep happy?
Thanks,
Bart.