On Fri, Apr 02, 2021 at 01:08:41PM -0700, Sagi Grimberg wrote: > The below patches caused a regression in a multipath setup: > Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic") > Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic") > > These patches on their own are correct because they fixed a controller reset > regression. > > When we reset/teardown a controller, we must freeze and quiesce the namespaces > request queues to make sure that we safely stop inflight I/O submissions. > Freeze is mandatory because if our hctx map changed between reconnects, > blk_mq_update_nr_hw_queues will immediately attempt to freeze the queue, and > if it still has pending submissions (that are still quiesced) it will hang. > This is what the above patches fixed. > > However, by freezing the namespaces request queues, and only unfreezing them > when we successfully reconnect, inflight submissions that are running > concurrently can now block grabbing the nshead srcu until either we successfully > reconnect or ctrl_loss_tmo expired (or the user explicitly disconnected). > > This caused a deadlock [1] when a different controller (different path on the > same subsystem) became live (i.e. optimized/non-optimized). This is because > nvme_mpath_set_live needs to synchronize the nshead srcu before requeueing I/O > in order to make sure that current_path is visible to future (re)submisions. > However the srcu lock is taken by a blocked submission on a frozen request > queue, and we have a deadlock. > > In recent kernels (v5.9+) direct_make_request was replaced by submit_bio_noacct > which does not have this issue because it bio_list will be active when > nvme-mpath calls submit_bio_noacct on the bottom device (because it was > populated when submit_bio was triggered on it. > > Hence, we need to fix all the kernels that were before submit_bio_noacct was > introduced. Why can we not just add submit_bio_noacct to the 5.4 kernel to correct this? What commit id is that? thanks, greg k-h