[PATCH V2 0/3] blk-mq/nvme: improve nvme-pci reset handler

Ming Lei <ming.lei@xxxxxxxxxx> · Sat, 30 May 2020 21:52:18 +0800

Hi,

For nvme-pci, after controller is recovered, in-flight IOs are waited
before updating nr hw queues. If new controller error happens during
this period, nvme-pci driver deletes the controller and fails in-flight
IO. This way is too violent, and not friendly from user viewpoint.

Add APIs for checking if queue is frozen, and replace nvme_wait_freeze
in nvme-pci reset handler with checking if all ns queues are frozen &
controller disabled. Then a fresh new reset can be scheduled for
handling new controller error during waiting for in-flight IO completion.

So deleting controller & failing IOs can be avoided in this situation.

Without this patches, when fail io timeout injection is run, the
controller can be removed very quickly. With this patch, no controller
removing can be observed, and controller can recover to normal state
after stopping to inject io timeout failure.

V2:
	- give up after retrying enough times
	- add comment on breaking because of shutdown

Ming Lei (3):
  blk-mq: add API of blk_mq_queue_frozen
  nvme: add nvme_frozen
  nvme-pci: make nvme reset more reliable

 block/blk-mq.c           |  6 +++++
 drivers/nvme/host/core.c | 17 +++++++++++++-
 drivers/nvme/host/nvme.h |  3 +++
 drivers/nvme/host/pci.c  | 50 +++++++++++++++++++++++++++++++++-------
 include/linux/blk-mq.h   |  1 +
 5 files changed, 68 insertions(+), 9 deletions(-)

Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Sagi Grimberg <sagi@xxxxxxxxxxx>
Cc: Keith Busch <kbusch@xxxxxxxxxx>
Cc: Max Gurtovoy <maxg@xxxxxxxxxxxx>

-- 
2.25.2