On Wed, May 20, 2020 at 07:56:52PM +0800, Ming Lei wrote: > Hi, > > For nvme-pci, after controller is recovered, in-flight IOs are waited > before updating nr hw queues. If new controller error happens during > this period, nvme-pci driver deletes the controller and fails in-flight > IO. This way is too violent, and not friendly from user viewpoint. > > Add APIs for checking if queue is frozen, and replace nvme_wait_freeze > in nvme-pci reset handler with checking if all ns queues are frozen & > controller disabled. Then a fresh new reset can be scheduled for > handling new controller error during waiting for in-flight IO completion. > > So deleting controller & failing IOs can be avoided in this situation. > > Without this patches, when fail io timeout injection is run, the > controller can be removed very quickly. With this patch, no controller > removing can be observed, and controller can recover to normal state > after stopping to inject io timeout failure. > > Ming Lei (3): > blk-mq: add API of blk_mq_queue_frozen > nvme: add nvme_frozen > nvme-pci: make nvme reset more reliable > > block/blk-mq.c | 6 ++++++ > drivers/nvme/host/core.c | 14 ++++++++++++++ > drivers/nvme/host/nvme.h | 1 + > drivers/nvme/host/pci.c | 37 ++++++++++++++++++++++++++++++------- > include/linux/blk-mq.h | 1 + > 5 files changed, 52 insertions(+), 7 deletions(-) > > -- > 2.25.2 > Hello Guys, Ping... Thanks, Ming