Hi ming On 05/14/2018 05:38 PM, Ming Lei wrote: >> Here is the deadlock scenario. >> >> nvme_eh_work // EH0 >> -> nvme_reset_dev //hold reset_lock >> -> nvme_setup_io_queues >> -> nvme_create_io_queues >> -> nvme_create_queue >> -> set nvmeq->cq_vector >> -> adapter_alloc_cq >> -> adapter_alloc_sq >> irq has not been requested >> io timeout >> nvme_eh_work //EH1 >> -> nvme_dev_disable >> -> quiesce the adminq //----> here ! >> -> nvme_suspend_queue >> print out warning Trying to free already-free IRQ 133 >> -> nvme_cancel_request // complete the timeout admin request >> -> require reset_lock >> -> adapter_delete_cq > If the admin IO submitted in adapter_alloc_sq() is timed out, > nvme_dev_disable() in EH1 will complete it which is set as REQ_FAILFAST_DRIVER, > then adapter_alloc_sq() should return error, and the whole reset in EH0 > should have been terminated immediately. Please refer to the following segment: static int nvme_create_queue(struct nvme_queue *nvmeq, int qid) { struct nvme_dev *dev = nvmeq->dev; int result; ... nvmeq->cq_vector = dev->num_vecs == 1 ? 0 : qid; result = adapter_alloc_cq(dev, qid, nvmeq); if (result < 0) goto release_vector; result = adapter_alloc_sq(dev, qid, nvmeq); // if timeout and failed here if (result < 0) goto release_cq; nvme_init_queue(nvmeq, qid); result = queue_request_irq(nvmeq); if (result < 0) goto release_sq; return result; release_sq: dev->online_queues--; adapter_delete_sq(dev, qid); release_cq: // we will be here ! adapter_delete_cq(dev, qid); // another cq delete admin command will be sent out. release_vector: nvmeq->cq_vector = -1; return result; } > > I guess the issue should be that nvme_create_io_queues() ignores the failure. > > Could you dump the stack trace of EH0 reset task? So that we may see > where EH0 reset kthread hangs. root@will-ThinkCentre-M910s:/home/will/Desktop# cat /proc/2273/stack [<0>] blk_execute_rq+0xf7/0x150 [<0>] __nvme_submit_sync_cmd+0x94/0x110 [<0>] nvme_submit_sync_cmd+0x1b/0x20 [<0>] adapter_delete_queue+0xad/0xf0 [<0>] nvme_reset_dev+0x1b67/0x2450 [<0>] nvme_eh_work+0x19c/0x4b0 [<0>] process_one_work+0x3ca/0xaa0 [<0>] worker_thread+0x89/0x6c0 [<0>] kthread+0x18d/0x1e0 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff root@will-ThinkCentre-M910s:/home/will/Desktop# cat /proc/2275/stack [<0>] nvme_eh_work+0x11a/0x4b0 [<0>] process_one_work+0x3ca/0xaa0 [<0>] worker_thread+0x89/0x6c0 [<0>] kthread+0x18d/0x1e0 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff > >> -> adapter_delete_queue // submit to the adminq which has been quiesced. >> -> nvme_submit_sync_cmd >> -> blk_execute_rq >> -> wait_for_completion_io_timeout >> hang_check is true, so there is no hung task warning for this context >> >> EH0 submit cq delete admin command, but it will never be completed or timed out, because the admin request queue has >> been quiesced, so the reset_lock cannot be released, and EH1 cannot get reset_lock and make things forward. > The nvme_dev_disable() in outer EH(EH1 in above log) will complete all > admin command, which won't be retried because it is set as > REQ_FAILFAST_DRIVER, so nvme_cancel_request() will complete it in > nvme_dev_disable(). This cq delete admin command is sent out after EH 1 nvme_dev_disable completed and failed the previous timeout sq alloc admin command. please refer to the code segment above. Thanks jianchao