Hi, For nvme-pci, after controller is recovered, in-flight IOs are waited before updating nr hw queues. If new controller error happens during this period, nvme-pci driver deletes the controller and fails in-flight IO. This way is too violent, and not friendly from user viewpoint. Add APIs for checking if queue is frozen, and replace nvme_wait_freeze in nvme-pci reset handler with checking if all ns queues are frozen & controller disabled. Then a fresh new reset can be scheduled for handling new controller error during waiting for in-flight IO completion. So deleting controller & failing IOs can be avoided in this situation. Without this patches, when fail io timeout injection is run, the controller can be removed very quickly. With this patch, no controller removing can be observed, and controller can recover to normal state after stopping to inject io timeout failure. V2: - give up after retrying enough times - add comment on breaking because of shutdown Ming Lei (3): blk-mq: add API of blk_mq_queue_frozen nvme: add nvme_frozen nvme-pci: make nvme reset more reliable block/blk-mq.c | 6 +++++ drivers/nvme/host/core.c | 17 +++++++++++++- drivers/nvme/host/nvme.h | 3 +++ drivers/nvme/host/pci.c | 50 +++++++++++++++++++++++++++++++++------- include/linux/blk-mq.h | 1 + 5 files changed, 68 insertions(+), 9 deletions(-) Cc: Christoph Hellwig <hch@xxxxxx> Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> Cc: Keith Busch <kbusch@xxxxxxxxxx> Cc: Max Gurtovoy <maxg@xxxxxxxxxxxx> -- 2.25.2