Hello,
The 1st three patch fixes io hang when controller removal interrupts error
recovery, then queue is left as frozen.
The 4th patch fixes io hang when controller is left as unquiesce.
Ming, what happened to nvme-tcp/rdma move of freeze/unfreeze to the
connect patches?
I'd suggest to handle all drivers(include nvme-pci) in same logic for avoiding
extra maintain burden wrt. error handling, but looks Keith worries about the
delay freezing may cause too many requests queued during error handling, and
that might cause user report.
For nvme-tcp/rdma your patch also addresses IO not failing over because
they block on queue enter. So I definitely want this for fabrics.
The patch in the following link should fix these issues too:
https://lore.kernel.org/linux-block/ZJGmW7lEaipT6saa@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#u
I guess you still want the paired freeze patch because it makes freeze &
unfreeze more reliable in error handling. If yes, I can make one fabric
only change for you.
Not sure exactly what reliability is referred here. I agree that there
is an issue with controller delete during error recovery. The patch
was a way to side-step it, great. But it addressed I/O blocked on enter
and not failing over.
So yes, for fabrics we should have it. I would argue that it would be
the right thing to do for pci as well. But I won't argue if Keith feels
otherwise.