On Sat, May 30, 2020 at 09:52:18PM +0800, Ming Lei wrote: > Hi, > > For nvme-pci, after controller is recovered, in-flight IOs are waited > before updating nr hw queues. If new controller error happens during > this period, nvme-pci driver deletes the controller and fails in-flight > IO. This way is too violent, and not friendly from user viewpoint. > > Add APIs for checking if queue is frozen, and replace nvme_wait_freeze > in nvme-pci reset handler with checking if all ns queues are frozen & > controller disabled. Then a fresh new reset can be scheduled for > handling new controller error during waiting for in-flight IO completion. > > So deleting controller & failing IOs can be avoided in this situation. > > Without this patches, when fail io timeout injection is run, the > controller can be removed very quickly. With this patch, no controller > removing can be observed, and controller can recover to normal state > after stopping to inject io timeout failure. > > V2: > - give up after retrying enough times > - add comment on breaking because of shutdown > > Ming Lei (3): > blk-mq: add API of blk_mq_queue_frozen > nvme: add nvme_frozen > nvme-pci: make nvme reset more reliable > > block/blk-mq.c | 6 +++++ > drivers/nvme/host/core.c | 17 +++++++++++++- > drivers/nvme/host/nvme.h | 3 +++ > drivers/nvme/host/pci.c | 50 +++++++++++++++++++++++++++++++++------- > include/linux/blk-mq.h | 1 + > 5 files changed, 68 insertions(+), 9 deletions(-) > > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> > Cc: Keith Busch <kbusch@xxxxxxxxxx> > Cc: Max Gurtovoy <maxg@xxxxxxxxxxxx> Hello Guys, Ping... Thanks, Ming