Re: [bug report] NVMe/IB: reset_controller need more than 1min

Sagi Grimberg <sagi@xxxxxxxxxxx> · Sun, 12 Dec 2021 11:45:47 +0200

On 12/11/21 5:01 AM, Yi Zhang wrote:
On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:

On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote:


Hello

Gentle ping here, this issue still exists on latest 5.13-rc7

# time nvme reset /dev/nvme0

real 0m12.636s
user 0m0.002s
sys 0m0.005s
# time nvme reset /dev/nvme0

real 0m12.641s
user 0m0.000s
sys 0m0.007s

Strange that even normal resets take so long...
What device are you using?

Hi Sagi

Here is the device info:
Mellanox Technologies MT27700 Family [ConnectX-4]


# time nvme reset /dev/nvme0

real 1m16.133s
user 0m0.000s
sys 0m0.007s

There seems to be a spurious command timeout here, but maybe this
is due to the fact that the queues take so long to connect and
the target expires the keep-alive timer.

Does this patch help?

The issue still exists, let me know if you need more testing for it. :)

Hi Sagi
ping, this issue still can be reproduced on the latest
linux-block/for-next, do you have a chance to recheck it, thanks.

Can you check if it happens with the below patch:
--

diff --git a/drivers/nvme/target/fabrics-cmd.c 
b/drivers/nvme/target/fabrics-cmd.c
index f91a56180d3d..6e5aadfb07a0 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -191,6 +191,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl 
*ctrl, struct nvmet_req *req)
                }
        }

+       /*
+        * Controller establishment flow may take some time, and the 
host may not
+        * send us keep-alive during this period, hence reset the
+        * traffic based keep-alive timer so we don't trigger a
+        * controller teardown as a result of a keep-alive expiration.
+        */
+       ctrl->reset_tbkas = true;
+
        return 0;

 err:
--