Hello. I'm having a problem with my NFS share. It's been present for some time now, kernel versions got upgraded, setup has been changed, but since I can't remember for sure when it started, I'll just describe my current configuration. But first the problem itself. Periodically (I would say every GB of data or so) reads from NFS share hang during which nfsd kernel thread on server eats CPU but no data gets sent to the client. After a minute everything comes to norm again (no action on my part is required). First thing I did to debug this is I enabled verbose output in all userspace daemons both on client and server - it produced no output whatsoever during the problematic period of time. Next I dumped network traffic on TCP port 2049 both on client and server - there was no packet drops or any other strange stuff, except that client restarted the TCP connection to 2049 port after a minute of silence from server (which resulted in data flowing again). This was confirmed by kernel debug output from client (echo 65535 | tee nfs_debug nfsd_debug nlm_debug rpc_debug) - NFS client sent a server READ request with 60 seconds timeout, timeout was reached and resulted in dropping and restarting of NFS TCP connection. So this points to NFS server kernel code. Kernel debug output on server is quite large and spikes during the hangs - I've attached deduplicated (by hand) version of it to this email. I couldn't find anything strange in there, but I don't understand most of it anyway. My current setup - both server and client are Linux 3.3.3, NFSv4 with sec=krb5, it runs through local network 192.168.0.0/24 with no firewalls (client has iptables disabled in kernel, server ACCEPTs everything from internal interface). Client uses Wi-Fi, server - Ethernet with VLANs, so traffic goes through AP. But since network dumps on server and client are the same, network configuration IMHO is irrelevant, I just added it for fullness of description. Most common usage (and test case) of this NFS share is watching some videos using mplayer. Underlying filesystem is XFS (though ext4 is used for other shares on the same server). I'm ready to provide additional information or test some patches, since this problem is quite annoying (and IMHO got worse with time).
Attachment:
nfs_debug.syslog
Description: Binary data