Re: [PATCH V2 0/3] blk-mq/nvme: improve nvme-pci reset handler

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 11 Jun 2020 15:27:24 +0800

On Sat, May 30, 2020 at 09:52:18PM +0800, Ming Lei wrote:
> Hi,
> 
> For nvme-pci, after controller is recovered, in-flight IOs are waited
> before updating nr hw queues. If new controller error happens during
> this period, nvme-pci driver deletes the controller and fails in-flight
> IO. This way is too violent, and not friendly from user viewpoint.
> 
> Add APIs for checking if queue is frozen, and replace nvme_wait_freeze
> in nvme-pci reset handler with checking if all ns queues are frozen &
> controller disabled. Then a fresh new reset can be scheduled for
> handling new controller error during waiting for in-flight IO completion.
> 
> So deleting controller & failing IOs can be avoided in this situation.
> 
> Without this patches, when fail io timeout injection is run, the
> controller can be removed very quickly. With this patch, no controller
> removing can be observed, and controller can recover to normal state
> after stopping to inject io timeout failure.
> 
> V2:
> 	- give up after retrying enough times
> 	- add comment on breaking because of shutdown
> 
> Ming Lei (3):
>   blk-mq: add API of blk_mq_queue_frozen
>   nvme: add nvme_frozen
>   nvme-pci: make nvme reset more reliable
> 
>  block/blk-mq.c           |  6 +++++
>  drivers/nvme/host/core.c | 17 +++++++++++++-
>  drivers/nvme/host/nvme.h |  3 +++
>  drivers/nvme/host/pci.c  | 50 +++++++++++++++++++++++++++++++++-------
>  include/linux/blk-mq.h   |  1 +
>  5 files changed, 68 insertions(+), 9 deletions(-)
> 
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Sagi Grimberg <sagi@xxxxxxxxxxx>
> Cc: Keith Busch <kbusch@xxxxxxxxxx>
> Cc: Max Gurtovoy <maxg@xxxxxxxxxxxx>

Hello Guys,

Ping...

Thanks,
Ming