Re: [3.2.5] NFSv3 CLOSE_WAIT hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2012-09-11 at 12:40 -0700, Simon Kirby wrote:
> On Mon, Sep 10, 2012 at 09:00:37AM +0000, Yan-Pai Chen wrote:
> 
> > Hi Trond,
> > 
> > Apologies for my late response.
> > Upgrading to kernel 3.5 requires some effort. I am still working on it.
> > 
> > After applying your patch on 3.3 kernel, the problem is gone when using UDP 
> > mounts.
> > But it remains hang in the case of NFS over TCP mounts. 
> > 
> > I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test 
> > suite repeatedly.
> > About less than 200 times, it eventually ran into the CLOSE_WAIT hang.
> > I got the following messages after enabling rpc_debug & nfs_debug:
> > 
> > 47991 0001    -11 cf2910e0   (null)        0 c0243f40 nfsv3 WRITE 
> > a:call_reserveresult q:xprt_sending
> > 47992 0001    -11 cf2910e0   (null)        0 c0243f40 nfsv3 WRITE 
> > a:call_reserveresult q:xprt_sending
> > 47993 0001    -11 cf2910e0   (null)        0 c0243f40 nfsv3 WRITE 
> > a:call_reserveresult q:xprt_sending
> > 47994 0001    -11 cf2910e0   (null)        0 c0243f40 nfsv3 WRITE 
> > a:call_reserveresult q:xprt_sending
> > 47995 0001    -11 cf2910e0   (null)        0 c0243f40 nfsv3 WRITE 
> > a:call_reserveresult q:xprt_sending
> > ...
> 
> Hello!
> 
> This problem still bites us rarely, and we've been using TCP NFS for some
> time. However, our case seems to be narrowed it down to a very long
> storage hang on the knfsd side. If storage never has any problems, we
> don't see the NFS client hang. I was going to try to make a test-case by
> forcing the server to hang, but I never got around to this. Meanwhile,
> I've been running the clients with the debugging patches I posted
> earlier, and it always prints the 'xprt_force_disconnect(): setting
> XPRT_CLOSE_WAIT" warning before hanging. If Apache is in sendfile() at
> the time, it seems to get stuck forever; otherwise, it might recover.

Does the "if (test_and_set_bit(XPRT_LOCK) == 0)" condition immediately
following that succeed so that queue_work() is called?

> http://www.spinics.net/lists/linux-nfs/msg29495.html
> http://0x.ca/sim/ref/3.2.10/dmesg
> 
> I suppose we could try 3.5 at this point.

If you've been keeping up with the 3.2 stable releases, then I wouldn't
expect any major differences to the sunrpc code, but it might be worth a
try in case the networking layer has changed.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux