Hi all,
We run a 4 node cluster on RHEL AS3.0 with these versions of clumanager
and redhat-config-cluster:
: redhat-config-cluster-1.0.3-1 clumanager-1.2.22-2
Everything has been working fine for a while, and today it started to
log messages like:
Jun 20 15:13:24 node4 clulockd[3763]: <warning> Denied A.B.C.30: Broken pipe
Jun 20 15:13:24 node4 clulockd[3763]: <err> select error: Broken pipe
Jun 20 15:13:34 node4 clulockd[3763]: <warning> Denied A.B.C.29: Broken pipe
Jun 20 15:13:34 node4 clulockd[3763]: <err> select error: Broken pipe
Jun 20 15:13:48 node4 cluquorumd[3723]: <notice> IPv4 TB @ A.B.C.254 Offline
Jun 20 15:13:49 node4 clulockd[3763]: <warning> Denied A.B.C.30:
Connection reset by peer
Jun 20 15:13:49 node4 clulockd[3763]: <err> select error: Connection
reset by peer
And ended up restarting the local service saying:
Jun 20 15:17:06 node4 clusvcmgrd[22077]: <err> Unable to obtain cluster
lock: Connection timed out
Jun 20 15:17:06 node4 clusvcmgrd[22077]: <warning> Restarting locally
failed service ploracm3
Jun 20 15:17:06 node4 cluquorumd[3723]: <notice> IPv4 TB @ A.B.C.254 Online
Jun 20 15:17:09 node4 clulockd[3763]: <warning> Denied A.B.C.29:
Connection reset by peer
Jun 20 15:17:09 node4 clulockd[3763]: <err> select error: Connection
reset by peer
My question is about the significance of maxrestarts and
maxfalserestarts. Could setting maxfalserestarts to say, 1 or so would
have averted this situation?
[root@node4 root]# redhat-config-cluster-cmd --service=ploracm1
service:
name = ploracm1
checkinterval = 10
failoverdomain = ploracm1
userscript = /etc/cluster/scripts/ploracm1
maxrestarts = 0
maxfalsestarts = 0
service_ipaddress:
ipaddress = A.B.C.D
netmask = 255.255.255.0
broadcast = A.B.C.255
Thanks in advance,
--AM
--
Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster