[NFS Session Trunking] Failure to Function When One Connection is Disconnected

Chengen Du <chengen.du@xxxxxxxxxxxxx> · Tue, 21 Jan 2025 12:24:42 +0800

Hi,

We have a customer experiencing an issue with session trunking.
The NFS client is utilizing two connections with different IPs to
connect to the NFS server as outlined below:
root@nfs-client:~# mount -t nfs -o vers=4.1,max_connect=2
192.168.122.77:/share /mnt
root@nfs-client:~# mount -t nfs -o vers=4.1,max_connect=2
192.168.122.13:/share /mnt

If one of the connections is disconnected, network traffic ceases on
both links, and access to the NFS share is no longer possible.

I have conducted a preliminary analysis and would greatly appreciate
additional opinions on this matter.
NFS relies on the Linux SUNRPC subsystem to facilitate communication
between the client and server.
When session trunking is enabled, multiple transport handles can be
identified using the `rpcctl xprt` command:
root@nfs-client:~# rpcctl xprt
xprt-0: tcp, 192.168.122.13, port 2049, state <CONNECTED,BOUND>
Source: 192.168.122.138, port 1023, Requests: 2
Congestion: cur 0, win 256, Slots: min 2, max 65536
Queues: binding 0, sending 0, pending 0, backlog 0, tasks 0
xprt-1: tcp, 192.168.122.77, port 2049, state <CONNECTED,BOUND>, main
Source: 192.168.122.138, port 1000, Requests: 2
Congestion: cur 0, win 256, Slots: min 2, max 65536
Queues: binding 0, sending 0, pending 0, backlog 0, tasks 0

When the client accesses an NFS share, the rpc_run_task() function is
called to handle RPC operations.
Within this function, rpc_task_set_transport() is invoked to select a
transport for communication.
The transport selection occurs in a round-robin fashion, which may
result in a broken connection being chosen.
This can cause the RPC operation to block until the connection is restored.
However, the XPRT_OFFLINE flag can be utilized to avoid selecting a
disconnected transport.
The XPRT_OFFLINE flag can be set via a corresponding sysfs entry.
By marking the disconnected transport as offline, we have confirmed
that the NFS share continues to function as expected.

[NFS Server]
root@nfs-server:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP group default qlen 1000
    link/ether 52:54:00:af:98:14 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.46/24 metric 100 brd 192.168.122.255 scope global
dynamic enp1s0
       valid_lft 2369sec preferred_lft 2369sec
    inet6 fe80::5054:ff:feaf:9814/64 scope link
       valid_lft forever preferred_lft forever
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP group default qlen 1000
    link/ether 52:54:00:a5:9f:e6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.77/24 metric 100 brd 192.168.122.255 scope global
dynamic enp7s0
       valid_lft 3591sec preferred_lft 3591sec
    inet6 fe80::5054:ff:fea5:9fe6/64 scope link
       valid_lft forever preferred_lft forever
4: enp8s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
pfifo_fast state DOWN group default qlen 1000
    link/ether 52:54:00:91:20:32 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe91:2032/64 scope link
       valid_lft forever preferred_lft forever

[NFS Client]
root@nfs-client:/sys/kernel/sunrpc/xprt-switches/switch-0/xprt-0-tcp#
cat dstaddr
192.168.122.13 (The IP address assigned to the enp8s0 interface on the
NFS server.)
root@nfs-client:/sys/kernel/sunrpc/xprt-switches/switch-0/xprt-0-tcp#
echo offline > xprt_state
root@nfs-client:/sys/kernel/sunrpc/xprt-switches/switch-0/xprt-0-tcp#
cat xprt_state
state= CONNECTED   BOUND      OFFLINE
root@nfs-client:/sys/kernel/sunrpc/xprt-switches/switch-0/xprt-0-tcp#
mount | grep share
192.168.122.77:/share on /mnt type nfs4
(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,max_connect=2,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.138,local_lock=none,addr=192.168.122.77)
root@nfs-client:/sys/kernel/sunrpc/xprt-switches/switch-0/xprt-0-tcp# ls /mnt/
rw.test

This approach is not practical for users to manage, as the status of
the main transport cannot be modified.
While the XPRT_OFFLINE flag is helpful, the NFS mechanism does not
offer an automated method to set it.

I understand that this behavior may not have been explicitly defined in the RFC.
A possible solution could involve detecting the link status before
processing RPC operations and using the XPRT_OFFLINE flag to control
the behavior.
Alternatively, introducing a new method in the SUNRPC subsystem to
constrain transport targets by providing a list of candidates might
also be effective.
As I am not an expert in this area, I would greatly appreciate any
insights or suggestions regarding this issue.
If this is identified as an issue requiring a fix, I would be
delighted to contribute to its resolution.

Your feedback on this matter is highly valued.

Best regards,
Chengen Du