On Fri, 2010-11-19 at 14:58 -0800, Simon Kirby wrote: > On Fri, Nov 19, 2010 at 05:17:19PM -0500, Trond Myklebust wrote: > > So what were all the > > > > 'lockd: server 10.10.52.xxx not responding, still trying' > > > > messages all about? There were quite a few of them for a number of > > different servers in the moments leading up to the hang. Could it be a > > problem with the switch these clients are attached to? > > If it were a switch problem, would we see port 2049 socket backlogs with > netstat -tan or ss -tan? I haven't seen this at all when the problem > occurs. All of the sockets are idle (and usually it seems to close them > all except the one server that all of the slots are stuck on). tcpdump > shows no problems, just very slow requests rates that match the rpc/nfs > debugging. No retransmits that might indicate dropped packets at the switch? How fast are the tcp ACKs from the server being returned? > If the rpc slots are stuck full, would that cause lockd to print those > timeouts? Yes. That would be the only kind of event that would trigger these messages. > Actually, another one just got stuck right now: > > [root@lsh1003:/root]# dmesg|tail > lockd: server 10.10.52.227 not responding, still trying > lockd: server 10.10.52.155 not responding, still trying > lockd: server 10.10.52.163 not responding, still trying > lockd: server 10.10.52.155 not responding, still trying > lockd: server 10.10.52.150 not responding, still trying > lockd: server 10.10.52.151 not responding, still trying > lockd: server 10.10.52.162 not responding, still trying > lockd: server 10.10.52.155 not responding, still trying > lockd: server 10.10.52.163 not responding, still trying > lockd: server 10.10.52.155 not responding, still trying > [root@lsh1003:/root]# netstat -tano | grep 2049 lockd requests don't get sent to port 2049. They go to whatever port the server is advertising using the RPC portmapper. rpcinfo -p <servername> | grep nlockmgr should tell you on which tcp and udp ports lockd is listening. Then you can try probing for service using rpcinfo -t <servername> nlockmgr rpcinfo -u <servername> nlockmgr Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html