Re: [PATCH 04/17] nvme: don't call nvme_kill_queues from nvme_remove_namespaces

Sagi Grimberg <sagi@xxxxxxxxxxx> · Tue, 25 Oct 2022 23:17:04 +0300

On 10/25/22 20:43, Keith Busch wrote:
On Tue, Oct 25, 2022 at 07:40:07AM -0700, Christoph Hellwig wrote:
@@ -4560,15 +4560,6 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
  	/* prevent racing with ns scanning */
  	flush_work(&ctrl->scan_work);
  
-	/*
-	 * The dead states indicates the controller was not gracefully
-	 * disconnected. In that case, we won't be able to flush any data while
-	 * removing the namespaces' disks; fail all the queues now to avoid
-	 * potentially having to clean up the failed sync later.
-	 */
-	if (ctrl->state == NVME_CTRL_DEAD)
-		nvme_kill_queues(ctrl);
-
  	/* this is a no-op when called from the controller reset handler */
  	nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ec034d4dd9eff..f971e96ffd3f6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3249,6 +3249,16 @@ static void nvme_remove(struct pci_dev *pdev)
  
  	flush_work(&dev->ctrl.reset_work);
  	nvme_stop_ctrl(&dev->ctrl);
+
+	/*
+	 * The dead states indicates the controller was not gracefully
+	 * disconnected. In that case, we won't be able to flush any data while
+	 * removing the namespaces' disks; fail all the queues now to avoid
+	 * potentially having to clean up the failed sync later.
+	 */
+	if (dev->ctrl.state == NVME_CTRL_DEAD)
+		nvme_kill_queues(&dev->ctrl);
+
  	nvme_remove_namespaces(&dev->ctrl);
  	nvme_dev_disable(dev, true);
  	nvme_remove_attrs(dev);
--
2.30.2


We still need the flush_work(scan_work) prior to killing the queues. It
looks like it could safely be moved to nvme_stop_ctrl(), which might
make it easier on everyone if it were there.

If we do end up moving it to nvme_stop_ctrl, can we make a sub-version
of nvme_stop_ctrl that cannot block on I/O (i.e. without ana/scan/auth)?
for multipathing where we want to teardown the controller quickly so we
can failover I/O asap.

IIRC this is why scan_work is not in nvme_stop_ctrl to begin with, but
it is also possible that there was some other deadlock caused by that.