Description of problem: Periodically, and with no obvious cause, all NFS connections between our Debian Testing (_Squeeze_) x86 client (a diskless node which uses nfsroot and boots from the server) and our Debian Testing (_Squeeze_) x86 server hang and dmesg on the client side informs that the server is "not responding". The server is responding to everyone else's requests. Restarting the nfsd on the server doesn't appear to solve the problem. At first I wasnt able to capture some debug information since /var/log was mounted over the nfs, so I have installed a hard drive where I mounted only /var/log to be able to capture debug logs from the client as well. Debug Logs: http://fixity.net/tmp/client.log.gz - Kernel RPC Debug Log from the client http://fixity.net/tmp/server.log.gz - Kernel RPC Debug Log from the server How reproducible: Happens from 10 to 90 minutes after booting the diskless node. Actual results: NFS connections stop responding, system hangs or becomes very slow and unresponsive (it doesnt respond to Ctrl+Alt+Del as well). 60 to 90 minutes after the first server time out client says server OK but the client is still unresponsive. Immediately after that the client logs server connection loss again which leads to continues loop. Client is still unresponsive. Sometimes client resumes normal operation for couple of hours but then the problem repeats. Connectivity info: Both the client and the server are connected to Gigabit Ethernet Cisco Metro series managable switch. Both of them use Intel Pro 82545GM Gigabit Ethernet Server Controllers. Neither one of them log any Ethernet errors and none are logged by the switch. Expected results: NFS connections continue to function and don't fail like clockwork when every other client on the network has no issues. Client & Server Load: For the purposes of testing both machines were only running needed daemons and weren’t loaded at all. Client & Server Kernel: On both the client and server custom compiled linux 2.6.29.3 kernel was used. Configuration file @ http://fixity.net/tmp/config-2.6.29.3.gz Client & Server Network interface fragmented packet queue length: net.ipv4.ipfrag_high_thresh = 524288 net.ipv4.ipfrag_low_thresh = 393216 Client Versions: libnfsidmap2/squeeze uptodate 0.21-2 nfs-common/squeeze uptodate 1:1.1.4-1 Client Mount (cat /proc/mounts | grep nfsroot): 10.11.11.1:/nfsroot / nfs rw,vers=3,rsize=524288,wsize=524288,namlen=255,hard,nointr,nolock,proto=tcp,time o=7,retrans=10,sec=sys,addr=10.11.11.1 0 0 Client fstab: proc /proc proc defaults 0 0 /dev/nfs / nfs defaults 1 1 none /tmp tmpfs defaults 0 0 none /var/run tmpfs defaults 0 0 none /var/lock tmpfs defaults 0 0 none /var/tmp tmpfs defaults 0 0 Client Daemons: portmap, rpc.statd, rpc.idmapd Server Daemons: portmap, rpc.statd, rpc.idmapd, rpc.mountd --manage-gids Server Versions: libnfsidmap2/squeeze uptodate 0.21-2 nfs-common/squeeze uptodate 1:1.1.4-1 nfs-kernel-server/testing uptodate 1:1.1.4-1 Server Export: /nfsroot 10.11.11.*(rw,no_root_squash,async,no_subtree_check) Server Options: RPCNFSDCOUNT=16 RPCNFSDPRIORITY=0 RPCMOUNTDOPTS=--manage-gids NEED_SVCGSSD=no RPCSVCGSSDOPTS=no Additional Info: Since I have read that tweaking the nfsroot mount options could improve the situation a have tested with different options as follows: rsize/wsize=1024|2048|4096|8192|32768|524288 timeo=15|60|600 retrans=3|10|20 None resulted in solving the problem. Any help or suggestions on fixing the problem would be highly appreciated. I have been messing with that problem for the last couple of weeks and ran out of ideas. Best Regards, Jerome Walters -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html