On Wed, 2023-11-08 at 15:55 -0500, Laurence Oberman wrote: > On Wed, 2023-11-08 at 15:07 -0500, Laurence Oberman wrote: > > On Wed, 2023-11-08 at 12:57 -0700, Mark Lehrer wrote: > > > > [ 286.547112] nvme nvme4: Connect Invalid Data Parameter, > > > > cntlid: > > > > 1 > > > > [ 286.555181] nvme nvme4: failed to connect queue: 1 ret=16770 > > > > > > It looks like the admin queue pair (0) worked at least. The code > > > path > > > for the two is a bit different. > > > > > > This error sounds familiar. I wonder if there's an error code > > > 16xxx > > > cheat sheet out there. > > > > > > We recently had to downgrade a ConnectX firmware version to fix a > > > similar issue, but on a CX7. I can't remember the firmware > > > versions > > > involved but I could probably dig it up. > > > > > > Have you tried TCP mode? Whether TCP works or not will be useful > > > information for debugging. > > > > > > > Hi MArk > > > > I landed up changing the default kato from 5s to 30 and its working > > now > > We don't jump ship too early anymore and it connects fine. > > See prior response where I answered my own message > > > > diff -Nurp linux-5.14.0- > > 284.25.1.el9_2.orig/drivers/nvme/host/nvme.h > > linux-5.14.0-284.25.1.el9_2/drivers/nvme/host/nvme.h > > --- linux-5.14.0- > > 284.25.1.el9_2.orig/drivers/nvme/host/nvme.h 2023- > > 07-20 08:42:08.000000000 -0400 > > +++ linux-5.14.0- > > 284.25.1.el9_2/drivers/nvme/host/nvme.h 2023- > > 11-08 14:16:37.924155469 -0500 > > @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; > > extern unsigned int admin_timeout; > > #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) > > > > -#define NVME_DEFAULT_KATO 5 > > +#define NVME_DEFAULT_KATO 30 > > > > #ifdef CONFIG_ARCH_NO_SG_CHAIN > > #define NVME_INLINE_SG_CNT 0 > > > > > > I will wait for Sagi and Keith and then send a patch > > I had the wrong email for Keith > > > > Thanks a lot > > Laurence > > > > Hello > > No fix needed, I was unaware of the -k option in the nvme connect. > My colleague showed it to me. > This works now to give the CX6 longer to handle the connection > > #!/bin/bash > modprobe nvme-fc > nvme connect -t rdma -n nqn.2023-10.org.dell -a 172.18.60.2 -s 4420 > - > k 30 > > > Thanks > So a Heads up for these newer cards I guess, need more time > > Regards > Laurence > > > > > Finalizing this discussion and adding appropriate cc's No patch needed, I was unaware of the -k option in the nvme connect. My colleague John Pittman showed it to me. and in fact Mark also pointed it out in a follow up email. This works now to give the CX6 longer to handle the connection. C.K Thanks to you as well for responding Initiator #!/bin/bash modprobe nvme-fc nvme connect -t rdma -n nqn.2023-10.org.dell -a 172.18.60.2 -s 4420 -k 30 Thanks So a Heads up for these newer cards I guess, need more time Learn something new every day