On Mon, Jul 10, 2023 at 12:27:31PM +0300, Sagi Grimberg wrote: > > > > > > > I still want your patches for tcp/rdma that move the freeze. > > > > > > If you are not planning to send them, I swear I will :) > > > > > > > > > > Ming, can you please send the tcp/rdma patches that move the > > > > > freeze? As I said before, it addresses an existing issue with > > > > > requests unnecessarily blocked on a frozen queue instead of > > > > > failing over. > > > > > > > > Any chance to fix the current issue in one easy(backportable) way[1] first? > > > > > > There is, you suggested one. And I'm requesting you to send a patch for > > > it. > > > > The patch is the one pointed by link [1], and it still can be applied on current > > linus tree. > > > > https://lore.kernel.org/linux-nvme/20230629064818.2070586-1-ming.lei@xxxxxxxxxx/ > > This is separate from what I am talking about. > > > > > All previous discussions on delay freeze[2] are generic, which apply on all > > > > nvme drivers, not mention this error handling difference causes extra maintain > > > > burden. I still suggest to convert all drivers in same way, and will work > > > > along the approach[1] aiming for v6.6. > > > > > > But we obviously hit a difference in expectations from different > > > drivers. In tcp/rdma there is currently an _existing_ bug, where > > > we freeze the queue on error recovery, and unfreeze only after we > > > reconnect. In the meantime, requests can be blocked on the frozen > > > request queue and not failover like they should. > > > > > > In fabrics the delta between error recovery and reconnect can (and > > > often will be) minutes or more. Hence I request that we solve _this_ > > > issue which is addressed by moving the freeze to the reconnect path. > > > > > > I personally think that pci should do this as well, and at least > > > dual-ported multipath pci devices would prefer instant failover > > > than after a full reset cycle. But Keith disagrees and I am not going to > > > push for it. > > > > > > Regardless of anything we do in pci, the tcp/rdma transport > > > freeze-blocking-failover _must_ be addressed. > > > > It is one generic issue, freeze/unfreeze has to be paired strictly > > for every driver. > > > > For any nvme driver, the inbalance can happen when error handling > > is involved, that is why I suggest to fix the issue in one generic > > way. > > Ming, you are ignoring what I'm saying. I don't care if the > freeze/unfreeze is 100% balanced or not (for the sake of this > discussion). > > I'm talking about a _separate_ issue where a queue > is frozen for potentially many minutes blocking requests that > could otherwise failover. > > > > So can you please submit a patch for each? Please phrase it as what > > > it is, a bug fix, so stable kernels can pick it up. And try to keep > > > it isolated to _only_ the freeze change so that it is easily > > > backportable. > > > > The patch of "[PATCH V2] nvme: mark ctrl as DEAD if removing from error > > recovery" can fix them all(include nvme tcp/fc's issue), and can be backported. > > Ming, this is completely separate from what I'm talking about. This one > is addressing when the controller is removed, while I'm talking about > the error-recovery and failover, which is ages before the controller is > removed. > > > But as we discussed, we still want to call freeze/unfreeze in pair, and > > I also suggest the following approach[2], which isn't good to backport: > > > > 1) moving freeze into reset > > > > 2) during resetting > > > > - freeze NS queues > > - unquiesce NS queues > > - nvme_wait_freeze() > > - update_nr_hw_queues > > - unfreeze NS queues > > > > 3) meantime changes driver's ->queue_rq() in case that ctrl state is NVME_CTRL_CONNECTING, > > > > - if the request is FS IO with data, re-submit all bios of this request, and free the request > > > > - otherwise, fail the request > > > > > > [2] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@xxxxxxxxxxx/T/#mfc96266b63eec3e4154f6843be72e5186a4055dc > > Ming, please read again what my concern is. I'm talking about error recovery > freezing a queue, and unfreezing only after we reconnect, > blocking requests that should failover. >From my understanding, nothing is special for tcp/rdma compared with nvme-pci. All take two stage error recovery: teardown & [reset(nvme-pci) | reconnect(tcp/rdma)] Queues are frozen during teardown, and unfreeze in reset or reconnect. If the 2nd stage is failed or bypassed, queues could be left as frozen & unquisced, and requests can't be handled, and io hang. When tcp reconnect failed, nvme_delete_ctrl() is called for failing requests & removing controller. Then the patch of "nvme: mark ctrl as DEAD if removing from error recovery" can avoid this issue by calling blk_mark_disk_dead() which can fail any request pending in bio_queue_enter(). If that isn't tcp/rdma's issue, can you explain it in details? At least, from what you mentioned in the following link, seems it is same with what I am trying to address. https://lore.kernel.org/linux-nvme/b11743c1-6c58-5f7a-8dc9-2a1a065835d0@xxxxxxxxxxx/T/#m5f07ee01cdc99b0b38305d8171e9085921df2bc2 Thanks, Ming