Hi, On 12/16/24 1:34 AM, Trond Myklebust wrote:
When I look at the captures I've taken, wireshark doesn't seem to recognize the traffic as NFS or RPC. I've attached a small portion of a capture. I will try to find out how to force wireshark to identify the traffic as NFS.On Sun, 2024-12-15 at 13:38 +0100, Rik Theys wrote:Hi, We are experiencing an issue on our Rocky 9 NFS server and Rocky 8, Rocky 9 and Fedora 41 clients. The server is (now) running upstream Linux 6.11.11 and the Fedora 41 clients are running the Fedora 6.11.11 kernel. The Rocky 8 and 9 machines are running the latest Rocky 8/9 kernels. Suddenly, a number of clients start to send an abnormal amount of NFS traffic to the server that saturates their link and never seems to stop. Running iotop on the clients shows kworker-{rpciod,nfsiod,xprtiod} processes generating the write traffic. On the server side, the system seems to process the traffic as the disks are processing the write requests. This behavior continues even after stopping all user processes on the clients and unmounting the NFS mount on the client. Is this normal? I was under the impression that once the NFS mount is unmounted no further traffic to the server should be visible? Not all clients seem to trigger this issue. On a Fedora 41 client that (auto)mounts home directories from the NFS server the behavior seems to be triggered when I start Thunderbird and let it process a lot of new mail (mail from the IMAP server is stored in the thunderbird cache that's stored in the nfs-mounted home directory). This triggers the high write traffic of the kworker threads. At first, thunderbird behaves normally but gets really slow over time. Stopping thunderbird does not stop the kworker threads and they keep sending a lot of traffic to the server. Can you point me to some steps to further diagnose this? Where can I find what triggers the creation of these kworker threads? Why does iotop show the write traffic with these threads, and not the thunderbird threads? There haven't been many changes to our kernels on the Rocky side recently. Is it possible a Fedora 41 client running a more recent kernel somehow triggers a behavior on the server that results in Rocky clients to start to misbehave?Which operations are the clients sending to the server? Ideally you'll want to look at a wireshark trace to see what is being send on the wire, but it might be sufficient to watch the 'nfsstat' output on both the clients and server to see what is anomalous or different about the traffic when the issue is occurring.
The traffic seems to be generated by the kworker threads I've mentioned above. Is it normal for a client to keep sending traffic using these threads to an NFS server even if there isn't an active mount anymore? Is the callback channel kept open between the client and server if a mount is unmounted?
Regards, Rik -- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>>
Attachment:
partial.pcap
Description: application/vnd.tcpdump.pcap