On Wed, 2012-09-05 at 07:49 +0000, Yan-Pai Chen wrote: > Simon Kirby <sim@...> writes: > > > > > Here's another CLOSE_WAIT hang, 3.2.5 client, 3.2.2 knfsd server, NFSv3. > > Not sure why these are all happening again now. This cluster seems to > > have a set of customers that are good at breaking things. ;) > > Hi all, > > I have the same problem in 3.3 kernel (client). > After applying the following modification as suggested by Dick in > http://www.spinics.net/lists/linux-nfs/msg32407.html, the problem > is just gone. > > Does anyone know if they are related? > Thanks. > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index c64c0ef..f979e9f 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1071,24 +1071,9 @@ void xprt_reserve(struct rpc_task *task) > { > struct rpc_xprt *xprt = task->tk_xprt; > > - task->tk_status = 0; > - if (task->tk_rqstp != NULL) > - return; > - > - /* Note: grabbing the xprt_lock_write() here is not strictly needed, > - * but ensures that we throttle new slot allocation if the transport > - * is congested (e.g. if reconnecting or if we're out of socket > - * write buffer space). > - */ > - task->tk_timeout = 0; > - task->tk_status = -EAGAIN; > - if (!xprt_lock_write(xprt, task)) > - return; > - > spin_lock(&xprt->reserve_lock); > xprt_alloc_slot(task); > spin_unlock(&xprt->reserve_lock); > - xprt_release_write(xprt, task); > } Clearly the comment is misleading and should be removed. That write lock _is_ needed in order to throttle slots on TCP. As far as I know, kernel 3.3 is not in stable support any more, so I can't help that. Can you reproduce the problem on a 3.5 kernel or higher? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥