Question on maxrestarts and maxfalserestarts

Anu Matthew <anu.matthew@xxxxxxx> · Mon, 20 Jun 2005 16:48:32 -0400

Hi all,

We run a 4 node cluster on RHEL AS3.0 with these versions of clumanager 
and redhat-config-cluster:

: redhat-config-cluster-1.0.3-1 clumanager-1.2.22-2

Everything has been working fine for a while, and today it started to 
log messages like:

Jun 20 15:13:24 node4 clulockd[3763]: <warning> Denied A.B.C.30: Broken pipe
Jun 20 15:13:24 node4 clulockd[3763]: <err> select error: Broken pipe
Jun 20 15:13:34 node4 clulockd[3763]: <warning> Denied A.B.C.29: Broken pipe
Jun 20 15:13:34 node4 clulockd[3763]: <err> select error: Broken pipe
Jun 20 15:13:48 node4 cluquorumd[3723]: <notice> IPv4 TB @ A.B.C.254 Offline
Jun 20 15:13:49 node4 clulockd[3763]: <warning> Denied A.B.C.30: 
Connection reset by peer
Jun 20 15:13:49 node4 clulockd[3763]: <err> select error: Connection 
reset by peer

And ended up restarting the local service saying:

Jun 20 15:17:06 node4 clusvcmgrd[22077]: <err> Unable to obtain cluster 
lock: Connection timed out
Jun 20 15:17:06 node4 clusvcmgrd[22077]: <warning> Restarting locally 
failed service ploracm3
Jun 20 15:17:06 node4 cluquorumd[3723]: <notice> IPv4 TB @ A.B.C.254 Online
Jun 20 15:17:09 node4 clulockd[3763]: <warning> Denied A.B.C.29: 
Connection reset by peer
Jun 20 15:17:09 node4 clulockd[3763]: <err> select error: Connection 
reset by peer

My question is about the significance of maxrestarts and 
maxfalserestarts. Could setting maxfalserestarts to say, 1 or so would 
have averted this situation?

[root@node4 root]# redhat-config-cluster-cmd --service=ploracm1

service:
 name = ploracm1
 checkinterval = 10
 failoverdomain = ploracm1
 userscript = /etc/cluster/scripts/ploracm1
 maxrestarts = 0
 maxfalsestarts = 0

service_ipaddress:
 ipaddress = A.B.C.D
 netmask = 255.255.255.0
 broadcast = A.B.C.255

Thanks in advance,

--AM

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster