On Thu, Jun 29, 2023 at 02:48:18PM +0800, Ming Lei wrote: > @@ -4054,8 +4055,14 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl) > * disconnected. In that case, we won't be able to flush any data while > * removing the namespaces' disks; fail all the queues now to avoid > * potentially having to clean up the failed sync later. > + * > + * If this removal happens during error recovering, resetting part > + * may not be started, or controller isn't be recovered completely, > + * so we have to treat controller as DEAD for avoiding IO hang since > + * queues can be left as frozen and quiesced. > */ > - if (ctrl->state == NVME_CTRL_DEAD) { > + if (ctrl->state == NVME_CTRL_DEAD || > + ctrl->old_state != NVME_CTRL_LIVE) { > nvme_mark_namespaces_dead(ctrl); > nvme_unquiesce_io_queues(ctrl); Thanks for the comment and style, but I really still think doing the state check was wrong to start with, and adding a check on the old state makes things significantly worse. Can we try to brainstorm on how do this properly? I think we need to first figure out how to balance the quiesce/unquiesce calls, the placement of the nvme_mark_namespaces_dead call should be the simple part.