Hi,
On 12/16/24 1:34 AM, Trond Myklebust wrote:
On Sun, 2024-12-15 at 13:38 +0100, Rik Theys wrote:
Hi,
We are experiencing an issue on our Rocky 9 NFS server and Rocky 8,
Rocky 9 and Fedora 41 clients.
The server is (now) running upstream Linux 6.11.11 and the Fedora 41
clients are running the Fedora 6.11.11 kernel. The Rocky 8 and 9
machines are running the latest Rocky 8/9 kernels.
Suddenly, a number of clients start to send an abnormal amount of NFS
traffic to the server that saturates their link and never seems to
stop.
Running iotop on the clients shows kworker-{rpciod,nfsiod,xprtiod}
processes generating the write traffic. On the server side, the
system
seems to process the traffic as the disks are processing the write
requests.
This behavior continues even after stopping all user processes on the
clients and unmounting the NFS mount on the client. Is this normal? I
was under the impression that once the NFS mount is unmounted no
further
traffic to the server should be visible?
Not all clients seem to trigger this issue. On a Fedora 41 client
that
(auto)mounts home directories from the NFS server the behavior seems
to
be triggered when I start Thunderbird and let it process a lot of new
mail (mail from the IMAP server is stored in the thunderbird cache
that's stored in the nfs-mounted home directory). This triggers the
high
write traffic of the kworker threads. At first, thunderbird behaves
normally but gets really slow over time. Stopping thunderbird does
not
stop the kworker threads and they keep sending a lot of traffic to
the
server.
Can you point me to some steps to further diagnose this? Where can I
find what triggers the creation of these kworker threads? Why does
iotop
show the write traffic with these threads, and not the thunderbird
threads?
There haven't been many changes to our kernels on the Rocky side
recently. Is it possible a Fedora 41 client running a more recent
kernel
somehow triggers a behavior on the server that results in Rocky
clients
to start to misbehave?
Which operations are the clients sending to the server? Ideally you'll
want to look at a wireshark trace to see what is being send on the
wire, but it might be sufficient to watch the 'nfsstat' output on both
the clients and server to see what is anomalous or different about the
traffic when the issue is occurring.
Looking at our collectd statistics from the time the issue happened, the
operations where sequence, putfh and getattr. Not all clients have this
behavior but the number of sequence and putfh requests the server
receives is 3x higher than the typical average and the number of getattr
requests is 6x.
Regards,
Rik
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>