Jumping late on this thread, pardon my ignorance of some details... On Wed, Apr 18, 2012 at 4:35 PM, Steve Thompson <smt@xxxxxxxxxxxx> wrote: > Interesting. It looks like some kind of RPC failure. During the hang, I > cannot contact the nfs service via RPC: > > # rpcinfo -t <server> nfs > rpcinfo: RPC: Timed out > program 100003 version 0 is not available > Did you run this command during "the hang" or is it constantly returning you that? If the later, are you blocking UDP on either the server or the client? > # rpcinfo -p <server> > program vers proto port > 100000 2 tcp 111 portmapper > 100000 2 udp 111 portmapper > 100024 1 udp 1007 status > 100024 1 tcp 1010 status > 100021 1 udp 35077 nlockmgr > 100021 3 udp 35077 nlockmgr > 100021 4 udp 35077 nlockmgr > 100021 1 tcp 56622 nlockmgr > 100021 3 tcp 56622 nlockmgr > 100021 4 tcp 56622 nlockmgr > 100011 1 udp 1009 rquotad > 100011 2 udp 1009 rquotad > 100011 1 tcp 1012 rquotad > 100011 2 tcp 1012 rquotad > 100003 2 udp 2049 nfs > 100003 3 udp 2049 nfs > 100003 4 udp 2049 nfs > 100003 2 tcp 2049 nfs > 100003 3 tcp 2049 nfs > 100003 4 tcp 2049 nfs > 100005 1 udp 605 mountd > 100005 1 tcp 608 mountd > 100005 2 udp 605 mountd > 100005 2 tcp 608 mountd > 100005 3 udp 605 mountd > 100005 3 tcp 608 mountd > > However, I can connect to the service via telnet: > > # telnet <server> nfs > Trying <ipaddr>... > Connected to <server> (<ipaddr>). > Escape character is '^]'. > If you don't specify transport protocol, rpcinfo will use whatever is defined in the /etc/netconfig database and that's usually UDP. A couple of ideas/questions: - Is it happening at the exact same minute (eg. 2:15, 2:45, 3:15, 3:45). This might help you to identify a script/program that follows that schedule. - Is there any configuration different between this server and the others? /etc/system, root crontab, etc. - When you say everything else BUT NFS is working fine, are pings answered properly without increased latency during "the hang" ? - What about other services? Can you set up a monitoring script connecting to some other service (eg. ftp, ls, exit or ssh) and reporting the total run time? - Can you set up a monitoring script running "rpcinfo" on localhost to make sure both local and remote communications hang? -- Giovanni _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos