ext Kein He wrote: > Hi Mockey, > > Could you please attach the output from " cman_tool status " and " > cman_tool nodes -f" ? > Thanks your response. I try to run cman_tool status on as-2, but it hang, without output, and even Ctrl+C also no effect. I open a new window and can using ssh to as-2, but after login, I can not do anything, even a simple 'ls' command is hung. It seem the system keep alive but do not provide any service. Really bad. Any way to debug this issue ? > > > Mockey Chen wrote: >> Hi, >> >> I have a two-nodes cluster, to avoid split-brain. I use ilo as fence >> device, IP tiebreaker. here is my /etc/cluster/cluster.conf >> <?xml version="1.0"?> >> <cluster alias="azerothcluster" config_version="19" >> name="azerothcluster"> >> <cman expected_votes="3" two_node="0"/> >> <clusternodes> >> <clusternode name="as-1.localdomain" nodeid="1" votes="1"> >> <fence> >> <method name="1"> >> <device name="ilo1"/> >> </method> >> </fence> >> </clusternode> >> <clusternode name="as-2.localdomain" nodeid="2" votes="1"> >> <fence> >> <method name="1"> >> <device name="ilo2"/> >> </method> >> </fence> >> </clusternode> >> </clusternodes> >> <quorumd interval="1" tko="10" votes="1" label="pingtest"> >> <heuristic program="ping 10.56.150.1 -c1 -t1" score="1" >> interval="2" tko="3"/> >> </quorumd> >> <fence_daemon post_fail_delay="0" post_join_delay="3"/> >> <fencedevices> >> <fencedevice agent="fence_ilo" hostname="10.56.154.18" >> login="power" name="ilo1" passwd="pass"/> >> <fencedevice agent="fence_ilo" hostname="10.56.154.19" >> login="power" name="ilo2" passwd="pass"/> >> </fencedevices> >> ... >> ... >> >> To test one node lost heartbeat case, I disable ethereal card (eth0) on >> as-1, I expect as-2 takeover services on as-1 and as-1 node reboot. >> The actual is as-1 lost connection to as-2. as-2 detected it and try to >> re-construct cluster, but failed, here is the syslog form as-2 >> >> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] The token was lost in the >> OPERATIONAL state. >> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Receive multicast socket >> recv buffer size (288000 bytes). >> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Transmit multicast socket >> send buffer size (262142 bytes). >> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] entering GATHER state >> from 2. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering GATHER state >> from 0. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Creating commit token >> because I am the rep. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Saving state aru 1f4 high >> seq received 1f4 >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Storing new sequence id for >> ring 2c >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering COMMIT state. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering RECOVERY state. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] position [0] member >> 10.56.150.4: >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] previous ring seq 40 rep >> 10.56.150.3 >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] aru 1f4 high delivered 1f4 >> received flag 1 >> >> Message from syslogd@ at Tue Feb 24 21:25:40 2009 ... >> as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 21:25:40 as-2 >> openais[4139]: [TOTEM] Did not need to originate any messages in >> recovery. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Sending initial ORF token >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >> Feb 24 21:25:40 as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved >> Feb 24 21:25:40 as-2 kernel: dlm: closing connection to node 1 >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.3) >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >> Feb 24 21:25:40 as-2 openais[4139]: [CMAN ] quorum lost, blocking >> activity >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >> Feb 24 21:25:40 as-2 openais[4139]: [SYNC ] This node is within the >> primary component and will provide service. >> Feb 24 21:25:40 as-2 ccsd[4130]: Cluster is not quorate. Refusing >> connection. >> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering OPERATIONAL state. >> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing connect: >> Connection refused >> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] got nodejoin message >> 10.56.150.4 >> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). >> Feb 24 21:25:40 as-2 openais[4139]: [CPG ] got joinlist message from >> node 2 >> Feb 24 21:25:40 as-2 ccsd[4130]: Someone may be attempting something >> evil. >> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing get: Invalid >> request descriptor >> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). >> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something >> evil. >> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing get: Invalid >> request descriptor >> Feb 24 21:25:41 as-2 ccsd[4130]: Invalid descriptor specified (-21). >> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something >> evil. >> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing disconnect: >> Invalid request descriptor >> Feb 24 21:25:41 as-2 avahi-daemon[3756]: Withdrawing address record for >> 10.56.150.144 on eth0. >> Feb 24 21:25:41 as-2 in.rdiscd[8641]: setsockopt (IP_ADD_MEMBERSHIP): >> Address already in use >> Feb 24 21:25:41 as-2 in.rdiscd[8641]: Failed joining addresse >> >> >> >> >> I also found there are some errors in as-1's syslog >> Feb 25 11:27:09 as-1 clurgmgrd[4332]: <err> #52: Failed changing RG >> status >> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> Link for eth0: Not >> detected >> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> No link on eth0... >> ... >> Feb 25 11:27:36 as-1 ccsd[4268]: Unable to connect to cluster >> infrastructure after 30 seconds. >> ... >> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >> infrastructure after 60 seconds. >> ... >> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >> infrastructure after 90 seconds. >> >> >> any comment is appreciated! >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster