On Wed, 2023-11-08 at 12:57 -0700, Mark Lehrer wrote: > > [ 286.547112] nvme nvme4: Connect Invalid Data Parameter, cntlid: > > 1 > > [ 286.555181] nvme nvme4: failed to connect queue: 1 ret=16770 > > It looks like the admin queue pair (0) worked at least. The code > path > for the two is a bit different. > > This error sounds familiar. I wonder if there's an error code 16xxx > cheat sheet out there. > > We recently had to downgrade a ConnectX firmware version to fix a > similar issue, but on a CX7. I can't remember the firmware versions > involved but I could probably dig it up. > > Have you tried TCP mode? Whether TCP works or not will be useful > information for debugging. > Hi MArk I landed up changing the default kato from 5s to 30 and its working now We don't jump ship too early anymore and it connects fine. See prior response where I answered my own message diff -Nurp linux-5.14.0-284.25.1.el9_2.orig/drivers/nvme/host/nvme.h linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h --- linux-5.14.0-284.25.1.el9_2.orig/drivers/nvme/host/nvme.h 2023- 07-20 08:42:08.000000000 -0400 +++ linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h 2023- 11-08 14:16:37.924155469 -0500 @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; extern unsigned int admin_timeout; #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) -#define NVME_DEFAULT_KATO 5 +#define NVME_DEFAULT_KATO 30 #ifdef CONFIG_ARCH_NO_SG_CHAIN #define NVME_INLINE_SG_CNT 0 I will wait for Sagi and Keith and then send a patch I had the wrong email for Keith Thanks a lot Laurence