On Mon, 2009-02-16 at 13:11 +0200, Arto Jantunen wrote: > (I'm not subscribed, so please CC me on any replies) > > I seem to have hit a NFS bug while upgrading a machine from Debian > Etch to Debian Lenny. I have a NFS server running FreeBSD 7.0 RC1 and > a bunch of clients running Linux. The ones running kernel 2.6.18 work > perfectly, as do the ones running 2.6.24. The one I upgraded to 2.6.26 > fails. After 5-15 minutes of working normally the mount dies and I get > the usual "nfs: server <server> not responding, still trying" in > dmesg. The only way I have found to get the mount back is umount -f && > mount, waiting does not bring it back. > > I have tested quite a bunch of different kernel versions, and starting > from 25 and ending at the git tree last week they all fail in the same > way. Bisecting tracks the problem to commit > e06799f958bf7f9f8fae15f0c6f519953fb0257c > > I originally thought that it was the same as bug 11154, but the > patches attached to that bug do not fix this issue. > > Any thoughts, patches, ideas? That looks like the known problem with the NFS server failing to close connections in a timely manner. There is a fix for this in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a There is also a client side patch that increases the robustness of the client when it hits a buggy server, and that causes it to do the equivalent of a linger2 timeout. That patch is as of yet not merged into mainline, however I've attached it below together with a followup patch that makes the timeout configurable... Cheers Trond
Attachment:
linux-2.6.28-100-add_tcp_linger.dif
Description: application/dif
Attachment:
linux-2.6.28-101-add_tcp_linger_sysctl.dif
Description: application/dif