Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 16, 2018 at 09:18:26AM -0600, Keith Busch wrote:
> On Wed, May 16, 2018 at 12:31:28PM +0800, Ming Lei wrote:
> > Hi Keith,
> > 
> > This issue may probably be fixed by Jianchao's patch of 'nvme: pci: set nvmeq->cq_vector
> > after alloc cq/sq'[1] and my another patch of 'nvme: pci: unquiesce admin
> > queue after controller is shutdown'[2], and both two have been included in the
> > posted V6.
> 
> No, it's definitely not related to that patch. The link is down in this
> test, I can assure you we're bailing out long before we ever even try to
> create an IO queue. The failing condition is detected by nvme_pci_enable's
> check for all 1's completions at the very beginning.

OK, this kind of failure during reset can be triggered in my test easily, then
nvme_remove_dead_ctrl() is called too, but not see IO hang from remove path.

As we discussed, it shouldn't be so, since queues are unquiesced &
killed, all IO should have been failed immediately. Also controller has
been shutdown, the queues are frozen too, so blk_mq_freeze_queue_wait()
won't wait on one unfrozen queue.

So could you post the debugfs log when the hang happens so that we may
find some clue?

Also, I don't think your issue is caused by this patchset, since
nvme_remove_dead_ctrl_work() and nvme_remove() aren't touched by this patch.
That means this issue may be triggered without this patchset too,
so could we start to review this patchset meantime?


Thanks,
Ming



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux