Re: NFS client/sunrpc getting stuck on 2.6.36

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2010-11-19 at 14:58 -0800, Simon Kirby wrote:
> On Fri, Nov 19, 2010 at 05:17:19PM -0500, Trond Myklebust wrote:
> > So what were all the 
> > 
> > 'lockd: server 10.10.52.xxx not responding, still trying'
> > 
> > messages all about? There were quite a few of them for a number of
> > different servers in the moments leading up to the hang. Could it be a
> > problem with the switch these clients are attached to?
> 
> If it were a switch problem, would we see port 2049 socket backlogs with
> netstat -tan or ss -tan?  I haven't seen this at all when the problem
> occurs.  All of the sockets are idle (and usually it seems to close them
> all except the one server that all of the slots are stuck on).  tcpdump
> shows no problems, just very slow requests rates that match the rpc/nfs
> debugging.

No retransmits that might indicate dropped packets at the switch? How
fast are the tcp ACKs from the server being returned?

> If the rpc slots are stuck full, would that cause lockd to print those
> timeouts?

Yes. That would be the only kind of event that would trigger these
messages.

> Actually, another one just got stuck right now:
> 
> [root@lsh1003:/root]# dmesg|tail
> lockd: server 10.10.52.227 not responding, still trying
> lockd: server 10.10.52.155 not responding, still trying
> lockd: server 10.10.52.163 not responding, still trying
> lockd: server 10.10.52.155 not responding, still trying
> lockd: server 10.10.52.150 not responding, still trying
> lockd: server 10.10.52.151 not responding, still trying
> lockd: server 10.10.52.162 not responding, still trying
> lockd: server 10.10.52.155 not responding, still trying
> lockd: server 10.10.52.163 not responding, still trying
> lockd: server 10.10.52.155 not responding, still trying
> [root@lsh1003:/root]# netstat -tano | grep 2049

lockd requests don't get sent to port 2049. They go to whatever port the
server is advertising using the RPC portmapper.

rpcinfo -p <servername> | grep nlockmgr

should tell you on which tcp and udp ports lockd is listening. Then you
can try probing for service using

rpcinfo -t <servername> nlockmgr
rpcinfo -u <servername> nlockmgr

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux