NFS 4 Trunking load balancing and failover

Thomas Glanzmann <thomas@xxxxxxxxxxxx> · Wed, 22 Jan 2025 05:46:00 +0100

Hello,
we tried to use nconnect and link trunking to access a NetApp NFS file using a
Debian Linux Kernel 6.12.9 with the following commands:

mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt

root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
root@debian-08:~# netstat -an | grep 2049
tcp        0      0 10.0.10.28:834          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:826          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:951          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:707          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:853          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:914          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.20.28:862          10.0.20.48:2049         ESTABLISHED
tcp        0      0 10.0.20.28:771          10.0.20.48:2049         TIME_WAIT
tcp        0      0 10.0.10.28:844          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:980          10.0.10.48:2049         ESTABLISHED

On the netapp you can see that the traffic is unequally distributed over the
two links:

n2 : 1/22/2025 04:38:58
                                  *Recv                  Sent
                         Recv      Data   Recv   Sent    Data   Sent Current
 LIF           Vserver Packet     (Bps) Errors Packet   (Bps) Errors    Port
---- ----------------- ------ --------- ------ ------ ------- ------ -------
nfs1 frontend-08-nfs41  26865 905471818      0  13786 1599403      0  e0e-10
nfs2 frontend-08-nfs41   3952 114124809      0   1737  201578      0  e0f-20

While that works, we noticed that to the first ip addresses 8 tcp
connections are established and to the second only one tcp connection is
established. When generating load we can see that the majority of the NFS
traffic goes to the first ip. Is there a way to have more TCP connections
established to the second ip?

Also we noticed that when we take the first server ip down, the NFS
sessions stalls. We hoped that the NFS client code transparently uses the
second ip address. Is that planned for the future?

I also tried the above with the VMware ESX hypervisor. And with the most
recent version 8.0 Update 3 C. The traffic is equally distributed
across the two links and when taking down one of two links, the I/O
continues.

Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
are connected using 2 10 Gbit/s links using two vlans to a Linux VM
which is also connected using two dedicated 10 Gbit/s links. In order to
direct the traffic, we use two VLANs. As a result we have two
dedicated 10 Gbit/s links between Linux VM and NetApp.

We also noticed that we get the best possible performance from Linux to NetApp
filer using the following mount options over one path:

-o vers=3,nconnect=16

With that setup we can get 150k 4k randop iops with a queue depth of 256
(4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
are busy 25 - 50% (16 4TB SSDs).

We used the following commands to generate load. nproc = 4.

# high queue depth:
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1

# 1 qd:
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1

Cheers,
        Thomas