Hi, I have a two-nodes cluster, to avoid split-brain. I use ilo as fence device, IP tiebreaker. here is my /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster alias="azerothcluster" config_version="19" name="azerothcluster"> <cman expected_votes="3" two_node="0"/> <clusternodes> <clusternode name="as-1.localdomain" nodeid="1" votes="1"> <fence> <method name="1"> <device name="ilo1"/> </method> </fence> </clusternode> <clusternode name="as-2.localdomain" nodeid="2" votes="1"> <fence> <method name="1"> <device name="ilo2"/> </method> </fence> </clusternode> </clusternodes> <quorumd interval="1" tko="10" votes="1" label="pingtest"> <heuristic program="ping 10.56.150.1 -c1 -t1" score="1" interval="2" tko="3"/> </quorumd> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <fencedevices> <fencedevice agent="fence_ilo" hostname="10.56.154.18" login="power" name="ilo1" passwd="pass"/> <fencedevice agent="fence_ilo" hostname="10.56.154.19" login="power" name="ilo2" passwd="pass"/> </fencedevices> ... ... To test one node lost heartbeat case, I disable ethereal card (eth0) on as-1, I expect as-2 takeover services on as-1 and as-1 node reboot. The actual is as-1 lost connection to as-2. as-2 detected it and try to re-construct cluster, but failed, here is the syslog form as-2 Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] The token was lost in the OPERATIONAL state. Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] entering GATHER state from 2. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering GATHER state from 0. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Creating commit token because I am the rep. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Saving state aru 1f4 high seq received 1f4 Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Storing new sequence id for ring 2c Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering COMMIT state. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering RECOVERY state. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] position [0] member 10.56.150.4: Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] previous ring seq 40 rep 10.56.150.3 Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] aru 1f4 high delivered 1f4 received flag 1 Message from syslogd@ at Tue Feb 24 21:25:40 2009 ... as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Did not need to originate any messages in recovery. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Sending initial ORF token Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: Feb 24 21:25:40 as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 21:25:40 as-2 kernel: dlm: closing connection to node 1 Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.3) Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: Feb 24 21:25:40 as-2 openais[4139]: [CMAN ] quorum lost, blocking activity Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: Feb 24 21:25:40 as-2 openais[4139]: [SYNC ] This node is within the primary component and will provide service. Feb 24 21:25:40 as-2 ccsd[4130]: Cluster is not quorate. Refusing connection. Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering OPERATIONAL state. Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing connect: Connection refused Feb 24 21:25:40 as-2 openais[4139]: [CLM ] got nodejoin message 10.56.150.4 Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). Feb 24 21:25:40 as-2 openais[4139]: [CPG ] got joinlist message from node 2 Feb 24 21:25:40 as-2 ccsd[4130]: Someone may be attempting something evil. Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing get: Invalid request descriptor Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something evil. Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing get: Invalid request descriptor Feb 24 21:25:41 as-2 ccsd[4130]: Invalid descriptor specified (-21). Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something evil. Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing disconnect: Invalid request descriptor Feb 24 21:25:41 as-2 avahi-daemon[3756]: Withdrawing address record for 10.56.150.144 on eth0. Feb 24 21:25:41 as-2 in.rdiscd[8641]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Feb 24 21:25:41 as-2 in.rdiscd[8641]: Failed joining addresse I also found there are some errors in as-1's syslog Feb 25 11:27:09 as-1 clurgmgrd[4332]: <err> #52: Failed changing RG status Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> Link for eth0: Not detected Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> No link on eth0... ... Feb 25 11:27:36 as-1 ccsd[4268]: Unable to connect to cluster infrastructure after 30 seconds. ... Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster infrastructure after 60 seconds. ... Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster infrastructure after 90 seconds. any comment is appreciated! -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster