On Mon, Jul 10, 2023 at 11:23:44AM +0300, Sagi Grimberg wrote: > > > > > > namespace's request queue is frozen and quiesced during error recovering, > > > > > writeback IO is blocked in bio_queue_enter(), so fsync_bdev() <- > > > > > del_gendisk() > > > > > can't move on, and causes IO hang. Removal could be from sysfs, hard > > > > > unplug or error handling. > > > > > > > > > > Fix this kind of issue by marking controller as DEAD if removal breaks > > > > > error recovery. > > > > > > > > > > This ways is reasonable too, because controller can't be recovered any > > > > > more after being removed. > > > > > > > > This looks fine to me Ming, > > > > Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > > > > > > > > > > > > I still want your patches for tcp/rdma that move the freeze. > > > > If you are not planning to send them, I swear I will :) > > > > > > Ming, can you please send the tcp/rdma patches that move the > > > freeze? As I said before, it addresses an existing issue with > > > requests unnecessarily blocked on a frozen queue instead of > > > failing over. > > > > Any chance to fix the current issue in one easy(backportable) way[1] first? > > There is, you suggested one. And I'm requesting you to send a patch for > it. The patch is the one pointed by link [1], and it still can be applied on current linus tree. https://lore.kernel.org/linux-nvme/20230629064818.2070586-1-ming.lei@xxxxxxxxxx/ > > > > > All previous discussions on delay freeze[2] are generic, which apply on all > > nvme drivers, not mention this error handling difference causes extra maintain > > burden. I still suggest to convert all drivers in same way, and will work > > along the approach[1] aiming for v6.6. > > But we obviously hit a difference in expectations from different > drivers. In tcp/rdma there is currently an _existing_ bug, where > we freeze the queue on error recovery, and unfreeze only after we > reconnect. In the meantime, requests can be blocked on the frozen > request queue and not failover like they should. > > In fabrics the delta between error recovery and reconnect can (and > often will be) minutes or more. Hence I request that we solve _this_ > issue which is addressed by moving the freeze to the reconnect path. > > I personally think that pci should do this as well, and at least > dual-ported multipath pci devices would prefer instant failover > than after a full reset cycle. But Keith disagrees and I am not going to > push for it. > > Regardless of anything we do in pci, the tcp/rdma transport > freeze-blocking-failover _must_ be addressed. It is one generic issue, freeze/unfreeze has to be paired strictly for every driver. For any nvme driver, the inbalance can happen when error handling is involved, that is why I suggest to fix the issue in one generic way. > > So can you please submit a patch for each? Please phrase it as what > it is, a bug fix, so stable kernels can pick it up. And try to keep > it isolated to _only_ the freeze change so that it is easily > backportable. The patch of "[PATCH V2] nvme: mark ctrl as DEAD if removing from error recovery" can fix them all(include nvme tcp/fc's issue), and can be backported. But as we discussed, we still want to call freeze/unfreeze in pair, and I also suggest the following approach[2], which isn't good to backport: 1) moving freeze into reset 2) during resetting - freeze NS queues - unquiesce NS queues - nvme_wait_freeze() - update_nr_hw_queues - unfreeze NS queues 3) meantime changes driver's ->queue_rq() in case that ctrl state is NVME_CTRL_CONNECTING, - if the request is FS IO with data, re-submit all bios of this request, and free the request - otherwise, fail the request [2] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@xxxxxxxxxxx/T/#mfc96266b63eec3e4154f6843be72e5186a4055dc thanks, Ming