Could anyone give some help about this issue? I've spent some days on this issue, both "tcp_retries2" and mount options like "timeo" and "retrans" do not work to give up retransmission earlier. Regards, Zhitao Li. On Fri, May 31, 2024 at 3:28 PM Zhitao Li <zhitao.li@xxxxxxxxxx> wrote: > > This problem is duplicated with > https://lore.kernel.org/linux-nfs/YQBPR01MB10724B629B69F7969AC6BDF9586C89@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > According to the discussion, the patch is submitted to fix the timeout > used in xprt_socket, which is still on the way. On the other hand, > "tcp_retries2" doesn't work in control the transmission timeout of an > unacknowledged packet. Is there any workaround to change the > transmission timeout? > > Best regards, > Zhitao Li > > On Wed, May 29, 2024 at 6:18 PM Zhitao Li <zhitao.li@xxxxxxxxxx> wrote: > > > > Essentially, we need a mechanism to quickly reconnect with new > > nfs-server nodes for failover. > > I also tried to adjust mount options like "timeo" to 10s and "retrans" > > to 1, and found that they don't work, either. It seems that the NFS > > v3 client always tries to reconnect after some request hangs for 3 > > minutes no matter what "timeo" and "retrans" is. > > > > On Wed, May 29, 2024 at 6:10 PM Zhitao Li <zhitao.li@xxxxxxxxxx> wrote: > > > > > > Hi, dear community, > > > > > > In our NFS environment, NFS client mounts remote NFS export with its > > > VIP. The VIP can be assigned to another server node for failover. > > > However, the NFS client sends the unacknowledged packet 50s+ after the > > > VIP is ready on the new node, which is because of the exponential > > > backoff retransmission algorithm. I tried to set this parameter > > > "tcp_retries2" smaller so that the NFS client can reconnect with the > > > new node more quickly, but this parameter didn't take effect. From > > > tcpdump entries as follows, > > > 1. At "2024-05-29 11:47:00", ARP is updated. > > > 2. At "2024-05-29 11:47:52" , the NFS client retried to send the packet. > > > 3. Then the connection is reset and a new connection starts. > > > > > > I guess the parameter just takes effect for applications and doesn't > > > take effect for kernel modules like the NFS client. Could anyone give > > > some advice to customize retransmission timeout of unacknowledged NFS > > > v3 TCP packet? > > > > > > > > > OS: Linux kernel v6.7.0 > > > NFS mount options: > > > vers=3,nolock,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport > > > > > > tcp_retries2: > > > [root@vm-play zhitaoli]# sysctl -w net.ipv4.tcp_retries2=5 > > > net.ipv4.tcp_retries2 = 5 > > > [root@vm-play zhitaoli]# cat /proc/sys/net/ipv4/tcp_retries2 > > > 5 > > > > > > tcpdump entries: > > > > > > 2024-05-29 11:46:02.331891 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973659245 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:02.542836 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973659456 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:02.751013 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973659664 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:03.166958 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973660080 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:04.046882 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973660960 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:05.710910 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973662624 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:09.039310 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973665952 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:16.017889 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973672930 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:29.326891 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973686240 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:46:55.950915 52:54:00:1d:a4:24 > 52:54:00:a0:93:93, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973712864 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:47:00.379844 52:54:00:13:1f:34 > Broadcast, ethertype > > > ARP (0x0806), length 60: Reply 10.125.1.85 is-at 52:54:00:13:1f:34, > > > length 46 > > > > > > 2024-05-29 11:47:52.271192 52:54:00:1d:a4:24 > 52:54:00:13:1f:34, > > > ethertype IPv4 (0x0800), length 190: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [P.], seq 129897:130021, ack 171633, win 2356, > > > options [nop,nop,TS val 1973769184 ecr 28456 > > > 58566], length 124: NFS request xid 1954624602 120 access fh > > > Unknown/43000001180100000000000000DE40020000000000F439000000000000000000 > > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE > > > > > > 2024-05-29 11:47:52.272041 52:54:00:13:1f:34 > 52:54:00:1d:a4:24, > > > ethertype IPv4 (0x0800), length 54: 10.125.1.85.nfs > > > > 10.125.1.214.58428: Flags [R], seq 1148562527, win 0, length 0 > > > > > > 2024-05-29 11:47:52.272909 52:54:00:1d:a4:24 > 52:54:00:13:1f:34, > > > ethertype IPv4 (0x0800), length 74: 10.125.1.214.58428 > > > > 10.125.1.85.nfs: Flags [S], seq 1734997801, win 32120, options [mss > > > 1460,sackOK,TS val 1973769186 ecr 0,nop,wscale 7], length 0 > > > > > > 2024-05-29 11:47:52.273503 52:54:00:13:1f:34 > 52:54:00:1d:a4:24, > > > ethertype IPv4 (0x0800), length 74: 10.125.1.85.nfs > > > > 10.125.1.214.58428: Flags [S.], seq 1078843840, ack 1734997802, win > > > 28960, options [mss 1460,sackOK,TS val 2235915769 ecr > > > 1973769186,nop,wscale 7], length 0 > > > > > > > > > Best regards, > > > Zhitao Li