Hi I 'm facing a problem : when testing a two-nodes cluster with quorum disk, when I poweroff the node1 , node 2 fences well the node 1 and failovers the service, but in log of node 2 I have before and after the fence success messages many messages like this: Apr 24 11:30:04 s_sys@xn3 qdiskd[13740]: <crit> Node 2 is undead. Apr 24 11:30:04 s_sys@xn3 qdiskd[13740]: <alert> Writing eviction notice for node 2 Apr 24 11:30:05 s_sys@xn3 qdiskd[13740]: <crit> Node 2 is undead. Apr 24 11:30:05 s_sys@xn3 qdiskd[13740]: <alert> Writing eviction notice for node 2 Apr 24 11:30:06 s_sys@xn3 qdiskd[13740]: <crit> Node 2 is undead. Apr 24 11:30:06 s_sys@xn3 qdiskd[13740]: <alert> Writing eviction notice for node 2 Apr 24 11:30:07 s_sys@xn3 qdiskd[13740]: <crit> Node 2 is undead. Apr 24 11:30:07 s_sys@xn3 qdiskd[13740]: <alert> Writing eviction notice for node 2 Apr 24 11:30:08 s_sys@xn3 qdiskd[13740]: <crit> Node 2 is undead. The problem is that when on node1 , after the reboot I try to start again the CS5 , cman fails with these messages in syslog : Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: cluster.conf (cluster name = A0ha2, version = 1) found. Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote copy of cluster.conf is from quorate node. Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Local version # : 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote version #: 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote copy of cluster.conf is from quorate node. Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Local version # : 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote version #: 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote copy of cluster.conf is from quorate node. Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Local version # : 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote version #: 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote copy of cluster.conf is from quorate node. Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Local version # : 1 Apr 24 11:47:02 s_sys@xn4 ccsd[11099]: Remote version #: 1 Apr 24 11:47:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 30 seconds. Apr 24 11:48:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 60 seconds. Apr 24 11:48:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 90 seconds. Apr 24 11:48:37 s_sys@xn4 ntpd[6179]: synchronized to 192.168.64.99, stratum 11 Apr 24 11:48:37 s_sys@xn4 ntpd[6179]: kernel time sync enabled 0001 Apr 24 11:49:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 120 seconds. Apr 24 11:49:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 150 seconds. Apr 24 11:50:01 s_sys@xn4 crond[11455]: (root) CMD (/usr/lib64/sa/sa1 1 1) Apr 24 11:50:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 180 seconds. Apr 24 11:50:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 210 seconds. Apr 24 11:51:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 240 seconds. Apr 24 11:51:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 270 seconds. Apr 24 11:52:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 300 seconds. Apr 24 11:52:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 330 seconds. Apr 24 11:53:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 360 seconds. Apr 24 11:53:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 390 seconds. Apr 24 11:54:01 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 420 seconds. Apr 24 11:54:31 s_sys@xn4 ccsd[11099]: Unable to connect to cluster infrastructure after 450 seconds ... etc. or also : Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Cluster is not quorate. Refusing connection. Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Error while processing connect: Connection refused Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Invalid descriptor specified (-111). Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Someone may be attempting something evil. Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Error while processing get: Invalid request descriptor Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Invalid descriptor specified (-111). Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Someone may be attempting something evil. Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Error while processing get: Invalid request descriptor Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Invalid descriptor specified (-21). Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Someone may be attempting something evil. Apr 24 10:17:37 s_sys@xn4 ccsd[11023]: Error while processing disconnect: Invalid request descriptor Apr 24 10:17:37 s_sys@xn4 rgmanager: [11331]: <notice> Cluster Service Manager is stopped. And I can't start it again, except after stopping the CS on both nodes. My cluster.conf qdisk record is likewise : <quorumd label="QDISK_2_0" interval="1" tko="10" votes="1" min_score="1"> <heuristic interval="10" tko="3" program="ping -t1 -c1 192.168.64.99" score="1"/> <heuristic interval="10" program="ping -t3 -c1 192.168.64.99" score="1"/> </quorumd> I need urgent help if you have any ideas on the problem ? Thanks a lot Regards. Alain Moullé -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster