help with "PSN sequence error" in Ubuntu 20.04 (using CX-4 or CX-6 mellanox cards)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

We are encountering an error condition (while doing NFSoRDMA) but the
problem seems to be in the RDMA core itself. The problem is that the
client at some point is ending in an RDMA NAK with "PNS Sequence
error" but the network trace shows all the PSNs are accounted for
(snippet at the bottom). It's as if the client lost its knowledge of
the current PSN.

Questions:
1. Is PSN handling done by the hardware card itself (in firmware) and
not in the kernel (making this a card/firmware specific problem)? I
was trying to look thru the rdma core/mlx5 driver code to see what
would generate a NAK with such error but wasn't able to find one. Only
found counters for nak_seq_error which made me think this is a
firmware problem.
2. If this is a kernel issue is this something that perhaps has been
fixed upstream but not present in Ubuntu?

Thank you for your help.

160 2021-07-22 13:17:52.579023 192.168.100.51 -> 192.168.100.28 NFS
v4.0 reply xid:0x0982a167 PUTFH;GETATTR (PSN: 15729419)
161 2021-07-22 13:17:52.579026 192.168.100.28 -> 192.168.100.51 RRoCE
RC_Acknowledge QP=0x017c PSN=15729419
162 2021-07-22 13:17:52.579247 192.168.100.28 -> 192.168.100.51 NFS
v4.0 call  xid:0x0a82a167 PUTFH;READDIR             DH:0xbee72168
cookie:0 verf:0x0000000000000000 count:8170
163 2021-07-22 13:17:52.579249 192.168.100.51 -> 192.168.100.28 RRoCE
RC_Acknowledge QP=0x0244 PSN=16086680
164 2021-07-22 13:17:52.579631 192.168.100.51 -> 192.168.100.28 RRoCE
RC_RDMA_WRITE_First QP=0x0244 PSN=15729420 size=4096 rkey=0x40000a13
dmalen=9824
165 2021-07-22 13:17:52.579644 192.168.100.51 -> 192.168.100.28 RRoCE
RC_RDMA_WRITE_Middle QP=0x0244 PSN=15729421 size=4096
166 2021-07-22 13:17:52.579652 192.168.100.51 -> 192.168.100.28 RRoCE
RC_RDMA_WRITE_Last QP=0x0244 PSN=15729422 size=1632
167 2021-07-22 13:17:52.579653 192.168.100.51 -> 192.168.100.28 NFS
v4.0 reply xid:0x0a82a167 PUTFH;READDIR
verf:0x0000000000000000 eof:TRUE (PSN: 15729423)
168 2021-07-22 13:17:52.579653 192.168.100.28 -> 192.168.100.51 RRoCE
RC_Acknowledge QP=0x017c PSN=15729420 PSN_SEQ_ERR



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux