Re: Mellanox CX6 and nvmet connectivity failure, happens on RHEL9.2 kernels and latest 6.6 upstream

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2023-11-08 at 15:07 -0500, Laurence Oberman wrote:
> On Wed, 2023-11-08 at 12:57 -0700, Mark Lehrer wrote:
> > > [  286.547112] nvme nvme4: Connect Invalid Data Parameter,
> > > cntlid:
> > > 1
> > > [  286.555181] nvme nvme4: failed to connect queue: 1 ret=16770
> > 
> > It looks like the admin queue pair (0) worked at least.  The code
> > path
> > for the two is a bit different.
> > 
> > This error sounds familiar.  I wonder if there's an error code
> > 16xxx
> > cheat sheet out there.
> > 
> > We recently had to downgrade a ConnectX firmware version to fix a
> > similar issue, but on a CX7.  I can't remember the firmware
> > versions
> > involved but I could probably dig it up.
> > 
> > Have you tried TCP mode?  Whether TCP works or not will be useful
> > information for debugging.
> > 
> 
> Hi MArk
> 
> I landed up changing the default kato from 5s to 30 and its working
> now
> We don't jump ship too early anymore and it connects fine.
> See prior response where I answered my own message
> 
> diff -Nurp linux-5.14.0-284.25.1.el9_2.orig/drivers/nvme/host/nvme.h
> linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h
> --- linux-5.14.0-284.25.1.el9_2.orig/drivers/nvme/host/nvme.h   2023-
> 07-20 08:42:08.000000000 -0400
> +++ linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h        2023-
> 11-08 14:16:37.924155469 -0500
> @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout;
>  extern unsigned int admin_timeout;
>  #define NVME_ADMIN_TIMEOUT     (admin_timeout * HZ)
>  
> -#define NVME_DEFAULT_KATO      5
> +#define NVME_DEFAULT_KATO      30
>  
>  #ifdef CONFIG_ARCH_NO_SG_CHAIN
>  #define  NVME_INLINE_SG_CNT  0
> 
> 
> I will wait for Sagi and Keith and then send a patch
> I had the wrong email for Keith
> 
> Thanks a lot
> Laurence
> 

Hello

No fix needed, I was unaware of the -k option in the nvme connect.
My colleague showed it to me.
This works now to give the CX6 longer to handle the connection

#!/bin/bash
modprobe nvme-fc
nvme connect -t rdma -n nqn.2023-10.org.dell -a  172.18.60.2  -s 4420 -
k 30


Thanks
So a Heads up for these newer cards I guess, need more time

Regards
Laurence









[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux