On Wed, May 16, 2018 at 09:18:26AM -0600, Keith Busch wrote: > On Wed, May 16, 2018 at 12:31:28PM +0800, Ming Lei wrote: > > Hi Keith, > > > > This issue may probably be fixed by Jianchao's patch of 'nvme: pci: set nvmeq->cq_vector > > after alloc cq/sq'[1] and my another patch of 'nvme: pci: unquiesce admin > > queue after controller is shutdown'[2], and both two have been included in the > > posted V6. > > No, it's definitely not related to that patch. The link is down in this > test, I can assure you we're bailing out long before we ever even try to > create an IO queue. The failing condition is detected by nvme_pci_enable's > check for all 1's completions at the very beginning. OK, this kind of failure during reset can be triggered in my test easily, then nvme_remove_dead_ctrl() is called too, but not see IO hang from remove path. As we discussed, it shouldn't be so, since queues are unquiesced & killed, all IO should have been failed immediately. Also controller has been shutdown, the queues are frozen too, so blk_mq_freeze_queue_wait() won't wait on one unfrozen queue. So could you post the debugfs log when the hang happens so that we may find some clue? Also, I don't think your issue is caused by this patchset, since nvme_remove_dead_ctrl_work() and nvme_remove() aren't touched by this patch. That means this issue may be triggered without this patchset too, so could we start to review this patchset meantime? Thanks, Ming