On Sun, Jul 09, 2023 at 10:38:29AM +0300, Sagi Grimberg wrote: > > > > namespace's request queue is frozen and quiesced during error recovering, > > > writeback IO is blocked in bio_queue_enter(), so fsync_bdev() <- > > > del_gendisk() > > > can't move on, and causes IO hang. Removal could be from sysfs, hard > > > unplug or error handling. > > > > > > Fix this kind of issue by marking controller as DEAD if removal breaks > > > error recovery. > > > > > > This ways is reasonable too, because controller can't be recovered any > > > more after being removed. > > > > This looks fine to me Ming, > > Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > > > > > > I still want your patches for tcp/rdma that move the freeze. > > If you are not planning to send them, I swear I will :) > > Ming, can you please send the tcp/rdma patches that move the > freeze? As I said before, it addresses an existing issue with > requests unnecessarily blocked on a frozen queue instead of > failing over. Any chance to fix the current issue in one easy(backportable) way[1] first? All previous discussions on delay freeze[2] are generic, which apply on all nvme drivers, not mention this error handling difference causes extra maintain burden. I still suggest to convert all drivers in same way, and will work along the approach[1] aiming for v6.6. [1] https://lore.kernel.org/linux-nvme/20230629064818.2070586-1-ming.lei@xxxxxxxxxx/ [2] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@xxxxxxxxxxx/T/#mfc96266b63eec3e4154f6843be72e5186a4055dc Thanks, Ming