Re: NFS 4 Trunking load balancing and failover

Anton Gavriliuk <antosha20xx@xxxxxxxxx> · Thu, 23 Jan 2025 16:02:50 +0200

> Also we noticed that when we take the first server ip down, the NFS
> sessions stalls. We hoped that the NFS client code transparently uses the
> second ip address. Is that planned for the future?

This is a very good question.  Half a year ago I had exactly the same problem.

It looks that the current NFS4 trunking is good for load balancing,
but not for failover.

If between NFS4 server and client there are N links, losing any single
link means losing all other N-1 links.

Anton

ср, 22 янв. 2025 г. в 06:54, Thomas Glanzmann <thomas@xxxxxxxxxxxx>:
>
> Hello,
> we tried to use nconnect and link trunking to access a NetApp NFS file using a
> Debian Linux Kernel 6.12.9 with the following commands:
>
> mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
> mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
>
> root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
> root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
> root@debian-08:~# netstat -an | grep 2049
> tcp        0      0 10.0.10.28:834          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:826          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:951          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:707          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:853          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:914          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.20.28:862          10.0.20.48:2049         ESTABLISHED
> tcp        0      0 10.0.20.28:771          10.0.20.48:2049         TIME_WAIT
> tcp        0      0 10.0.10.28:844          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:980          10.0.10.48:2049         ESTABLISHED
>
> On the netapp you can see that the traffic is unequally distributed over the
> two links:
>
> n2 : 1/22/2025 04:38:58
>                                   *Recv                  Sent
>                          Recv      Data   Recv   Sent    Data   Sent Current
>  LIF           Vserver Packet     (Bps) Errors Packet   (Bps) Errors    Port
> ---- ----------------- ------ --------- ------ ------ ------- ------ -------
> nfs1 frontend-08-nfs41  26865 905471818      0  13786 1599403      0  e0e-10
> nfs2 frontend-08-nfs41   3952 114124809      0   1737  201578      0  e0f-20
>
> While that works, we noticed that to the first ip addresses 8 tcp
> connections are established and to the second only one tcp connection is
> established. When generating load we can see that the majority of the NFS
> traffic goes to the first ip. Is there a way to have more TCP connections
> established to the second ip?
>
> Also we noticed that when we take the first server ip down, the NFS
> sessions stalls. We hoped that the NFS client code transparently uses the
> second ip address. Is that planned for the future?
>
> I also tried the above with the VMware ESX hypervisor. And with the most
> recent version 8.0 Update 3 C. The traffic is equally distributed
> across the two links and when taking down one of two links, the I/O
> continues.
>
> Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
> are connected using 2 10 Gbit/s links using two vlans to a Linux VM
> which is also connected using two dedicated 10 Gbit/s links. In order to
> direct the traffic, we use two VLANs. As a result we have two
> dedicated 10 Gbit/s links between Linux VM and NetApp.
>
> We also noticed that we get the best possible performance from Linux to NetApp
> filer using the following mount options over one path:
>
> -o vers=3,nconnect=16
>
> With that setup we can get 150k 4k randop iops with a queue depth of 256
> (4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
> 4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
> are busy 25 - 50% (16 4TB SSDs).
>
> We used the following commands to generate load. nproc = 4.
>
> # high queue depth:
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1
>
> # 1 qd:
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1
>
> Cheers,
>         Thomas
>