Re: Unexplained NFS mount hangs

Bryan McLellan <btm@xxxxxxxxxxxxxx> · Mon, 13 Apr 2009 16:11:35 -0700

On Mon, Apr 13, 2009 at 9:47 AM, Daniel Stickney <dstickney@xxxxxxxxxx> wrote:
> To add a little more info, in a post on April 10th titled "NFSv3 Client Timeout on 2.6.27" Bryan mentioned that his client socket was in state FIN_WAIT2, and server in CLOSE_WAIT, which is exactly what I am seeing here.

Since my problems originated after upgrading to Ubuntu intrepid in a
'etch -> hardy -> intrepid' cycle, and hardy contained 2.6.24, I
wonder if the regression was in:

commit e06799f958bf7f9f8fae15f0c6f519953fb0257c
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date:   Mon Nov 5 15:44:12 2007 -0500

    SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket

    By using shutdown() rather than close() we allow the RPC client to wait
    for the TCP close handshake to complete before we start trying to reconnect
    using the same port.
    We use shutdown(SHUT_WR) only instead of shutting down both directions,
    however we wait until the server has closed the connection on its side.

    Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

$ git describe e06799f958bf7f9f8fae15f0c6f519953fb0257c --contains
v2.6.25-rc1~1146^2~105

I came in today to find that the one machine outside of production
that was hung that I could toy with eventually fixed itself, albeit
five days later.

Apr  8 12:42:34 bvt-was02 kernel: [3706362.490101] nfs: server
file01.prod.example.com not responding, still trying
Apr 13 12:09:59 bvt-was02 kernel: [4136407.174292] nfs: server
file01.prod.example.com OK

There looks like there are a lot of additional timeouts added in
2.6.30-rc1, so perhaps I'll compile from source and wait to see if
this happens again on the test machines.

Bryan
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html