On Tue, 2012-09-11 at 12:40 -0700, Simon Kirby wrote: > On Mon, Sep 10, 2012 at 09:00:37AM +0000, Yan-Pai Chen wrote: > > > Hi Trond, > > > > Apologies for my late response. > > Upgrading to kernel 3.5 requires some effort. I am still working on it. > > > > After applying your patch on 3.3 kernel, the problem is gone when using UDP > > mounts. > > But it remains hang in the case of NFS over TCP mounts. > > > > I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test > > suite repeatedly. > > About less than 200 times, it eventually ran into the CLOSE_WAIT hang. > > I got the following messages after enabling rpc_debug & nfs_debug: > > > > 47991 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47992 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47993 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47994 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47995 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > ... > > Hello! > > This problem still bites us rarely, and we've been using TCP NFS for some > time. However, our case seems to be narrowed it down to a very long > storage hang on the knfsd side. If storage never has any problems, we > don't see the NFS client hang. I was going to try to make a test-case by > forcing the server to hang, but I never got around to this. Meanwhile, > I've been running the clients with the debugging patches I posted > earlier, and it always prints the 'xprt_force_disconnect(): setting > XPRT_CLOSE_WAIT" warning before hanging. If Apache is in sendfile() at > the time, it seems to get stuck forever; otherwise, it might recover. Does the "if (test_and_set_bit(XPRT_LOCK) == 0)" condition immediately following that succeed so that queue_work() is called? > http://www.spinics.net/lists/linux-nfs/msg29495.html > http://0x.ca/sim/ref/3.2.10/dmesg > > I suppose we could try 3.5 at this point. If you've been keeping up with the 3.2 stable releases, then I wouldn't expect any major differences to the sunrpc code, but it might be worth a try in case the networking layer has changed. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥