ext Kein He wrote: > I think there is a problem, from "cman_tool status" shows: > > Nodes: 2 > Expected votes: 3 > Total votes: 2 > > > according to your cluster.conf , if all nodes and qdisk are online, > the "Total votes" must be "3". Probably "qdiskd" is not running, you > can use " cman_tool nodes" to check if qdisk is working. > Yes, here is "cman_tool nodes" output: Node Sts Inc Joined Name 1 M 112 2009-02-25 03:05:19 as-1.localdomain 2 M 104 2009-02-25 03:05:19 as-2.localdomain A question is how to check whether qdisk is running ? and how to run it ? Thanks. > > > > Mockey Chen wrote: >> ext Mockey Chen wrote: >> >>> ext Kein He wrote: >>> >>>> Hi Mockey, >>>> >>>> Could you please attach the output from " cman_tool status " and " >>>> cman_tool nodes -f" ? >>>> >>>> >>> Thanks your response. >>> >>> I try to run cman_tool status on as-2, but it hang, without output, and >>> even Ctrl+C also no effect. >>> >> I manually reboot as-1, and the problem solved. >> >> There is the output of cman_tool >> >> [root@as-1 ~]# cman_tool status >> Version: 6.1.0 >> Config Version: 19 >> Cluster Name: azerothcluster >> Cluster Id: 20148 >> Cluster Member: Yes >> Cluster Generation: 76 >> Membership state: Cluster-Member >> Nodes: 2 >> Expected votes: 3 >> Total votes: 2 >> Quorum: 2 Active subsystems: 8 >> Flags: Dirty >> Ports Bound: 0 177 Node name: as-1.localdomain >> Node ID: 1 >> Multicast addresses: 239.192.78.3 >> Node addresses: 10.56.150.3 >> [root@as-1 ~]# cman_tool status -f >> Version: 6.1.0 >> Config Version: 19 >> Cluster Name: azerothcluster >> Cluster Id: 20148 >> Cluster Member: Yes >> Cluster Generation: 76 >> Membership state: Cluster-Member >> Nodes: 2 >> Expected votes: 3 >> Total votes: 2 >> Quorum: 2 Active subsystems: 8 >> Flags: Dirty >> Ports Bound: 0 177 Node name: as-1.localdomain >> Node ID: 1 >> Multicast addresses: 239.192.78.3 >> Node addresses: 10.56.150.3 >> >> >> It seems cluster can not fence one of the node. How to solve it ? >> >> >>> I open a new window and can using ssh to as-2, but after login, I can >>> not do anything, even a >>> simple 'ls' command is hung. >>> >>> It seem the system keep alive but do not provide any service. Really >>> bad. >>> >>> Any way to debug this issue ? >>> >>>> Mockey Chen wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a two-nodes cluster, to avoid split-brain. I use ilo as fence >>>>> device, IP tiebreaker. here is my /etc/cluster/cluster.conf >>>>> <?xml version="1.0"?> >>>>> <cluster alias="azerothcluster" config_version="19" >>>>> name="azerothcluster"> >>>>> <cman expected_votes="3" two_node="0"/> >>>>> <clusternodes> >>>>> <clusternode name="as-1.localdomain" nodeid="1" votes="1"> >>>>> <fence> >>>>> <method name="1"> >>>>> <device name="ilo1"/> >>>>> </method> >>>>> </fence> >>>>> </clusternode> >>>>> <clusternode name="as-2.localdomain" nodeid="2" votes="1"> >>>>> <fence> >>>>> <method name="1"> >>>>> <device name="ilo2"/> >>>>> </method> >>>>> </fence> >>>>> </clusternode> >>>>> </clusternodes> >>>>> <quorumd interval="1" tko="10" votes="1" label="pingtest"> >>>>> <heuristic program="ping 10.56.150.1 -c1 -t1" >>>>> score="1" >>>>> interval="2" tko="3"/> >>>>> </quorumd> >>>>> <fence_daemon post_fail_delay="0" post_join_delay="3"/> >>>>> <fencedevices> >>>>> <fencedevice agent="fence_ilo" hostname="10.56.154.18" >>>>> login="power" name="ilo1" passwd="pass"/> >>>>> <fencedevice agent="fence_ilo" hostname="10.56.154.19" >>>>> login="power" name="ilo2" passwd="pass"/> >>>>> </fencedevices> >>>>> ... >>>>> ... >>>>> >>>>> To test one node lost heartbeat case, I disable ethereal card >>>>> (eth0) on >>>>> as-1, I expect as-2 takeover services on as-1 and as-1 node reboot. >>>>> The actual is as-1 lost connection to as-2. as-2 detected it and >>>>> try to >>>>> re-construct cluster, but failed, here is the syslog form as-2 >>>>> >>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] The token was lost in the >>>>> OPERATIONAL state. >>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Receive multicast socket >>>>> recv buffer size (288000 bytes). >>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Transmit multicast socket >>>>> send buffer size (262142 bytes). >>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] entering GATHER state >>>>> from 2. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering GATHER state >>>>> from 0. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Creating commit token >>>>> because I am the rep. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Saving state aru 1f4 high >>>>> seq received 1f4 >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Storing new sequence >>>>> id for >>>>> ring 2c >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering COMMIT state. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering RECOVERY state. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] position [0] member >>>>> 10.56.150.4: >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] previous ring seq 40 rep >>>>> 10.56.150.3 >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] aru 1f4 high delivered >>>>> 1f4 >>>>> received flag 1 >>>>> >>>>> Message from syslogd@ at Tue Feb 24 21:25:40 2009 ... >>>>> as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 21:25:40 >>>>> as-2 >>>>> openais[4139]: [TOTEM] Did not need to originate any messages in >>>>> recovery. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Sending initial ORF token >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >>>>> Feb 24 21:25:40 as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved >>>>> Feb 24 21:25:40 as-2 kernel: dlm: closing connection to node 1 >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.3) >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CMAN ] quorum lost, blocking >>>>> activity >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >>>>> Feb 24 21:25:40 as-2 openais[4139]: [SYNC ] This node is within the >>>>> primary component and will provide service. >>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Cluster is not quorate. Refusing >>>>> connection. >>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering OPERATIONAL >>>>> state. >>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing connect: >>>>> Connection refused >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] got nodejoin message >>>>> 10.56.150.4 >>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). >>>>> Feb 24 21:25:40 as-2 openais[4139]: [CPG ] got joinlist message from >>>>> node 2 >>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Someone may be attempting something >>>>> evil. >>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing get: Invalid >>>>> request descriptor >>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). >>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something >>>>> evil. >>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing get: Invalid >>>>> request descriptor >>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Invalid descriptor specified (-21). >>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something >>>>> evil. >>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing disconnect: >>>>> Invalid request descriptor >>>>> Feb 24 21:25:41 as-2 avahi-daemon[3756]: Withdrawing address >>>>> record for >>>>> 10.56.150.144 on eth0. >>>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: setsockopt (IP_ADD_MEMBERSHIP): >>>>> Address already in use >>>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: Failed joining addresse >>>>> >>>>> >>>>> >>>>> >>>>> I also found there are some errors in as-1's syslog >>>>> Feb 25 11:27:09 as-1 clurgmgrd[4332]: <err> #52: Failed changing RG >>>>> status >>>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> Link for eth0: Not >>>>> detected >>>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> No link on eth0... >>>>> ... >>>>> Feb 25 11:27:36 as-1 ccsd[4268]: Unable to connect to cluster >>>>> infrastructure after 30 seconds. >>>>> ... >>>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >>>>> infrastructure after 60 seconds. >>>>> ... >>>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >>>>> infrastructure after 90 seconds. >>>>> >>>>> >>>>> any comment is appreciated! >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster@xxxxxxxxxx >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster