The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable@xxxxxxxxxxxxxxx>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 99dc264014d5aed66ee37ddf136a38b5a2b1b529 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable@xxxxxxxxxxxxxxx>' --in-reply-to '2023081225-impotence-uncurious-0ad9@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: 99dc264014d5 ("nvme-tcp: fix potential unbalanced freeze & unfreeze") 9f27bd701d18 ("nvme: rename the queue quiescing helpers") 91c11d5f3254 ("nvme-rdma: stop auth work after tearing down queues in error recovery") 1f1a4f89562d ("nvme-tcp: stop auth work after tearing down queues in error recovery") eac3ef262941 ("nvme-pci: split the initial probe from the rest path") a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable") 3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl") 2e87570be9d2 ("nvme-pci: factor out a nvme_pci_alloc_dev helper") 081a7d958ce4 ("nvme-pci: factor the iod mempool creation into a helper") 94cc781f69f4 ("nvme: move OPAL setup from PCIe to core") cd50f9b24726 ("nvme: split nvme_kill_queues") 6bcd5089ee13 ("nvme: don't unquiesce the admin queue in nvme_kill_queues") 0ffc7e98bfaa ("nvme-pci: refactor the tagset handling in nvme_reset_work") 71b26083d59c ("block: set the disk capacity to 0 in blk_mark_disk_dead") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 99dc264014d5aed66ee37ddf136a38b5a2b1b529 Mon Sep 17 00:00:00 2001 From: Ming Lei <ming.lei@xxxxxxxxxx> Date: Tue, 11 Jul 2023 17:40:40 +0800 Subject: [PATCH] nvme-tcp: fix potential unbalanced freeze & unfreeze Move start_freeze into nvme_tcp_configure_io_queues(), and there is at least two benefits: 1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal 2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown. One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal: 1) same problem exists with current code base 2) compared with !mpath, mpath use case is dominant Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic") Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> Tested-by: Yi Zhang <yi.zhang@xxxxxxxxxx> Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 3e7dd6f91832..fb24cd8ac46c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1868,6 +1868,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new) goto out_cleanup_connect_q; if (!new) { + nvme_start_freeze(ctrl); nvme_unquiesce_io_queues(ctrl); if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) { /* @@ -1876,6 +1877,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new) * to be safe. */ ret = -ENODEV; + nvme_unfreeze(ctrl); goto out_wait_freeze_timed_out; } blk_mq_update_nr_hw_queues(ctrl->tagset, @@ -1980,7 +1982,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, if (ctrl->queue_count <= 1) return; nvme_quiesce_admin_queue(ctrl); - nvme_start_freeze(ctrl); nvme_quiesce_io_queues(ctrl); nvme_sync_io_queues(ctrl); nvme_tcp_stop_io_queues(ctrl);