Re: Mellanox CX6 and nvmet connectivity failure, happens on RHEL9.2 kernels and latest 6.6 upstream

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2023-11-08 at 12:57 -0700, Mark Lehrer wrote:
> > [  286.547112] nvme nvme4: Connect Invalid Data Parameter, cntlid:
> > 1
> > [  286.555181] nvme nvme4: failed to connect queue: 1 ret=16770
> 
> It looks like the admin queue pair (0) worked at least.  The code
> path
> for the two is a bit different.
> 
> This error sounds familiar.  I wonder if there's an error code 16xxx
> cheat sheet out there.
> 
> We recently had to downgrade a ConnectX firmware version to fix a
> similar issue, but on a CX7.  I can't remember the firmware versions
> involved but I could probably dig it up.
> 
> Have you tried TCP mode?  Whether TCP works or not will be useful
> information for debugging.
> 

Hi MArk

I landed up changing the default kato from 5s to 30 and its working now
We don't jump ship too early anymore and it connects fine.
See prior response where I answered my own message

diff -Nurp linux-5.14.0-284.25.1.el9_2.orig/drivers/nvme/host/nvme.h
linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h
--- linux-5.14.0-284.25.1.el9_2.orig/drivers/nvme/host/nvme.h	2023-
07-20 08:42:08.000000000 -0400
+++ linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h	2023-
11-08 14:16:37.924155469 -0500
@@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout;
 extern unsigned int admin_timeout;
 #define NVME_ADMIN_TIMEOUT	(admin_timeout * HZ)
 
-#define NVME_DEFAULT_KATO	5
+#define NVME_DEFAULT_KATO	30
 
 #ifdef CONFIG_ARCH_NO_SG_CHAIN
 #define  NVME_INLINE_SG_CNT  0


I will wait for Sagi and Keith and then send a patch
I had the wrong email for Keith

Thanks a lot
Laurence





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux