Re: [PATCH V3 7/8] nvme: pci: recover controller reliably

Ming Lei <tom.leiming@xxxxxxxxx> · Sat, 5 May 2018 08:16:37 +0800

On Fri, May 4, 2018 at 4:28 PM, jianchao.wang
<jianchao.w.wang@xxxxxxxxxx> wrote:
> Hi ming
>
> On 05/04/2018 04:02 PM, Ming Lei wrote:
>>> nvme_error_handler should invoke nvme_reset_ctrl instead of introducing another interface.
>>> Then it is more convenient to ensure that there will be only one resetting instance running.
>>>
>> But as you mentioned above, reset_work has to be splitted into two
>> contexts for handling IO timeout during wait_freeze in reset_work,
>> so single instance of nvme_reset_ctrl() may not work well.
>
> I mean the EH kthread and the reset_work which both could reset the ctrl instead of
> the pre and post rest context.
>
> Honestly, I suspect a bit that whether it is worthy to try to recover from [1].
> The Eh kthread solution could make things easier, but the codes for recovery from [1] has
> made code really complicated. It is more difficult to unify the nvme-pci, rdma and fc.

Another choice may be nested EH, which should be easier to implement:

- run the whole recovery procedures(shutdown & reset) in one single context
- and start a new context to handle new timeout during last recovery in the
same way

The two approaches is just like sync IO vs AIO.

Thanks,
Ming Lei