On Mon, May 21, 2018 at 09:03:31AM -0600, Keith Busch wrote: > On Mon, May 21, 2018 at 10:58:51PM +0800, Ming Lei wrote: > > On Mon, May 21, 2018 at 08:22:19AM -0600, Keith Busch wrote: > > > On Sat, May 19, 2018 at 07:03:58AM +0800, Ming Lei wrote: > > > > On Fri, May 18, 2018 at 10:38:20AM -0600, Keith Busch wrote: > > > > > + > > > > > + if (unfreeze) > > > > > + nvme_wait_freeze(&dev->ctrl); > > > > > + > > > > > > > > timeout may comes just before&during blk_mq_update_nr_hw_queues() or > > > > the above nvme_wait_freeze(), then both two may hang forever. > > > > > > Why would it hang forever? The scan_work doesn't stop a timeout from > > > triggering a reset to reclaim requests necessary to complete a freeze. > > > > nvme_dev_disable() will quiesce queues, then nvme_wait_freeze() or > > blk_mq_update_nr_hw_queues() may hang forever. > > nvme_dev_disable is just the first part of the timeout sequence. You > have to follow it through to the reset_work that either restarts or > kills the queues. nvme_dev_disable() quiesces queues first before killing queues. If queues are quiesced during or before nvme_wait_freeze() is run from the 2nd part of reset, the 2nd part can't move on, and IO hang is caused. Finally no reset can be scheduled at all. Thanks, Ming