Re: [bug report] NVMe/IB: reset_controller need more than 1min

Max Gurtovoy <mgurtovoy@xxxxxxxxxx> · Wed, 23 Feb 2022 12:04:11 +0200

Hi Yi Zhang,

thanks for testing the patches.

Can you provide more info on the time it took with both kernels ?

The patches don't intend to decrease this time but re-start the KA in 
early stage - as soon as we create the AQ.

I guess we need to debug it offline.

On 2/21/2022 12:00 PM, Yi Zhang wrote:
Hi Max

The patch fixed the timeout issue when I use one non-debug kernel,
but when I tested on debug kernel with your patches, the timeout still
can be triggered with "nvme reset/nvme disconnect-all" operations.

On Tue, Feb 15, 2022 at 10:31 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
Thanks Yi Zhang.

Few years ago I've sent some patches that were supposed to fix the KA
mechanism but eventually they weren't accepted.

I haven't tested it since but maybe you can run some tests with it.

The attached patches are partial and include only rdma transport for
your testing.

If it work for you we can work on it again and argue for correctness.

Please don't use the workaround we suggested earlier with these patches.

-Max.

On 2/15/2022 3:52 PM, Yi Zhang wrote:
Hi Sagi/Max

Changing the value to 10 or 15 fixed the timeout issue.
And the reset operation still needs more than 12s on my environment, I
also tried disabling the pi_enable, the reset operation will be back
to 3s, so seems the added 9s was due to the PI enabled code path.

On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
On 2/14/2022 1:32 PM, Sagi Grimberg wrote:
Hi Sagi/Max
Here are more findings with the bisect:

The time for reset operation changed from 3s[1] to 12s[2] after
commit[3], and after commit[4], the reset operation timeout at the
second reset[5], let me know if you need any testing for it, thanks.
Does this at least eliminate the timeout?
--

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..60e415078893 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout;
   extern unsigned int admin_timeout;
   #define NVME_ADMIN_TIMEOUT     (admin_timeout * HZ)

-#define NVME_DEFAULT_KATO      5
+#define NVME_DEFAULT_KATO      10

   #ifdef CONFIG_ARCH_NO_SG_CHAIN
   #define  NVME_INLINE_SG_CNT  0
--

or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in
the connect command

[1]
# time nvme reset /dev/nvme0

real 0m3.049s
user 0m0.000s
sys 0m0.006s
[2]
# time nvme reset /dev/nvme0

real 0m12.498s
user 0m0.000s
sys 0m0.006s
[3]
commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD)
Author: Max Gurtovoy <maxg@xxxxxxxxxxxx>
Date:   Tue May 19 17:05:56 2020 +0300

       nvme-rdma: add metadata/T10-PI support

[4]
commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD)
Author: Hannes Reinecke <hare@xxxxxxx>
Date:   Fri Apr 16 13:46:20 2021 +0200

       nvme: sanitize KATO setting-
This change effectively changed the keep-alive timeout
from 15 to 5 and modified the host to send keepalives every
2.5 seconds instead of 5.

I guess that in combination that now it takes longer to
create and delete rdma resources (either qps or mrs)
it starts to timeout in setups where there are a lot of
queues.