ext Kein He wrote: > > Unfortunately , you need a shared disk to run qdisk, it can not work > in "diskless" mode right now. > Is there a way to avoid it ? Unfortunately, I did not have a shared disk. > >> ext Brett Cave wrote: >> >>> On Wed, Feb 25, 2009 at 11:45 AM, Mockey Chen <mockey.chen@xxxxxxx> >>> wrote: >>> >>>> ext Kein He wrote: >>>> >>>>> I think there is a problem, from "cman_tool status" shows: >>>>> >>>>> Nodes: 2 >>>>> Expected votes: 3 >>>>> Total votes: 2 >>>>> >>>>> >>>>> according to your cluster.conf , if all nodes and qdisk are online, >>>>> the "Total votes" must be "3". Probably "qdiskd" is not running, you >>>>> can use " cman_tool nodes" to check if qdisk is working. >>>>> >>>>> >>>> Yes, here is "cman_tool nodes" output: >>>> Node Sts Inc Joined Name >>>> 1 M 112 2009-02-25 03:05:19 as-1.localdomain >>>> 2 M 104 2009-02-25 03:05:19 as-2.localdomain >>>> >>>> A question is how to check whether qdisk is running ? and how to >>>> run it ? >>>> >>> [root@blade3 ~]# service qdiskd status >>> qdiskd (pid 2832) is running... >>> [root@blade3 ~]# pgrep qdisk -l >>> 2832 qdiskd >>> [root@blade3 ~]# cman_tool nodes >>> Node Sts Inc Joined Name >>> 0 M 0 2009-02-19 16:11:55 /dev/sda5 ## This is qdisk. >>> 1 M 1524 2009-02-20 22:27:32 blade1 >>> 2 M 1552 2009-02-24 04:39:24 blade2 >>> 3 M 1500 2009-02-19 16:11:03 blade3 >>> 4 M 1516 2009-02-19 16:11:22 blade4 >>> >>> You can use "service qdisk start" to start it, or run it with >>> /usr/sbin/qdisk -Q if you dont have the init script. If you installed >>> from rpm on a rh type distro, then the script should be there. >>> >>> REgards, >>> brett >>> >> I try to use "service qdiskd start", but it failed: >> [root@as-2 ~]# service qdiskd start >> Starting the Quorum Disk Daemon: [FAILED] >> [root@as-2 ~]# tail /var/log/messages >> ... >> Feb 26 09:19:40 as-2 qdiskd[14707]: <crit> Unable to match label >> 'testing' to any device >> Feb 26 09:19:46 as-2 clurgmgrd[4032]: <notice> Reconfiguring >> >> Here is my qdisk configuration, I copy it from "man qdisk": >> <quorumd interval="1" tko="10" votes="1" label="testing"> >> <heuristic program="ping 10.56.150.1 -c1 -t1" score="1" >> interval="2" tko="3"/> >> </quorumd> >> >> How to map label to device. Note: I did not have any shared storage. >> >> >>>> Thanks. >>>> >>>>> Mockey Chen wrote: >>>>> >>>>>> ext Mockey Chen wrote: >>>>>> >>>>>> >>>>>>> ext Kein He wrote: >>>>>>> >>>>>>> >>>>>>>> Hi Mockey, >>>>>>>> >>>>>>>> Could you please attach the output from " cman_tool status " and " >>>>>>>> cman_tool nodes -f" ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Thanks your response. >>>>>>> >>>>>>> I try to run cman_tool status on as-2, but it hang, without >>>>>>> output, and >>>>>>> even Ctrl+C also no effect. >>>>>>> >>>>>>> >>>>>> I manually reboot as-1, and the problem solved. >>>>>> >>>>>> There is the output of cman_tool >>>>>> >>>>>> [root@as-1 ~]# cman_tool status >>>>>> Version: 6.1.0 >>>>>> Config Version: 19 >>>>>> Cluster Name: azerothcluster >>>>>> Cluster Id: 20148 >>>>>> Cluster Member: Yes >>>>>> Cluster Generation: 76 >>>>>> Membership state: Cluster-Member >>>>>> Nodes: 2 >>>>>> Expected votes: 3 >>>>>> Total votes: 2 >>>>>> Quorum: 2 Active subsystems: 8 >>>>>> Flags: Dirty >>>>>> Ports Bound: 0 177 Node name: as-1.localdomain >>>>>> Node ID: 1 >>>>>> Multicast addresses: 239.192.78.3 >>>>>> Node addresses: 10.56.150.3 >>>>>> [root@as-1 ~]# cman_tool status -f >>>>>> Version: 6.1.0 >>>>>> Config Version: 19 >>>>>> Cluster Name: azerothcluster >>>>>> Cluster Id: 20148 >>>>>> Cluster Member: Yes >>>>>> Cluster Generation: 76 >>>>>> Membership state: Cluster-Member >>>>>> Nodes: 2 >>>>>> Expected votes: 3 >>>>>> Total votes: 2 >>>>>> Quorum: 2 Active subsystems: 8 >>>>>> Flags: Dirty >>>>>> Ports Bound: 0 177 Node name: as-1.localdomain >>>>>> Node ID: 1 >>>>>> Multicast addresses: 239.192.78.3 >>>>>> Node addresses: 10.56.150.3 >>>>>> >>>>>> >>>>>> It seems cluster can not fence one of the node. How to solve it ? >>>>>> >>>>>> >>>>>> >>>>>>> I open a new window and can using ssh to as-2, but after >>>>>>> login, I can >>>>>>> not do anything, even a >>>>>>> simple 'ls' command is hung. >>>>>>> >>>>>>> It seem the system keep alive but do not provide any service. >>>>>>> Really >>>>>>> bad. >>>>>>> >>>>>>> Any way to debug this issue ? >>>>>>> >>>>>>> >>>>>>>> Mockey Chen wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have a two-nodes cluster, to avoid split-brain. I use ilo as >>>>>>>>> fence >>>>>>>>> device, IP tiebreaker. here is my /etc/cluster/cluster.conf >>>>>>>>> <?xml version="1.0"?> >>>>>>>>> <cluster alias="azerothcluster" config_version="19" >>>>>>>>> name="azerothcluster"> >>>>>>>>> <cman expected_votes="3" two_node="0"/> >>>>>>>>> <clusternodes> >>>>>>>>> <clusternode name="as-1.localdomain" nodeid="1" >>>>>>>>> votes="1"> >>>>>>>>> <fence> >>>>>>>>> <method name="1"> >>>>>>>>> <device name="ilo1"/> >>>>>>>>> </method> >>>>>>>>> </fence> >>>>>>>>> </clusternode> >>>>>>>>> <clusternode name="as-2.localdomain" nodeid="2" >>>>>>>>> votes="1"> >>>>>>>>> <fence> >>>>>>>>> <method name="1"> >>>>>>>>> <device name="ilo2"/> >>>>>>>>> </method> >>>>>>>>> </fence> >>>>>>>>> </clusternode> >>>>>>>>> </clusternodes> >>>>>>>>> <quorumd interval="1" tko="10" votes="1" >>>>>>>>> label="pingtest"> >>>>>>>>> <heuristic program="ping 10.56.150.1 -c1 -t1" >>>>>>>>> score="1" >>>>>>>>> interval="2" tko="3"/> >>>>>>>>> </quorumd> >>>>>>>>> <fence_daemon post_fail_delay="0" post_join_delay="3"/> >>>>>>>>> <fencedevices> >>>>>>>>> <fencedevice agent="fence_ilo" hostname="10.56.154.18" >>>>>>>>> login="power" name="ilo1" passwd="pass"/> >>>>>>>>> <fencedevice agent="fence_ilo" hostname="10.56.154.19" >>>>>>>>> login="power" name="ilo2" passwd="pass"/> >>>>>>>>> </fencedevices> >>>>>>>>> ... >>>>>>>>> ... >>>>>>>>> >>>>>>>>> To test one node lost heartbeat case, I disable ethereal card >>>>>>>>> (eth0) on >>>>>>>>> as-1, I expect as-2 takeover services on as-1 and as-1 node >>>>>>>>> reboot. >>>>>>>>> The actual is as-1 lost connection to as-2. as-2 detected it and >>>>>>>>> try to >>>>>>>>> re-construct cluster, but failed, here is the syslog form as-2 >>>>>>>>> >>>>>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] The token was lost >>>>>>>>> in the >>>>>>>>> OPERATIONAL state. >>>>>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Receive multicast >>>>>>>>> socket >>>>>>>>> recv buffer size (288000 bytes). >>>>>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Transmit multicast >>>>>>>>> socket >>>>>>>>> send buffer size (262142 bytes). >>>>>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] entering GATHER state >>>>>>>>> from 2. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering GATHER state >>>>>>>>> from 0. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Creating commit token >>>>>>>>> because I am the rep. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Saving state aru >>>>>>>>> 1f4 high >>>>>>>>> seq received 1f4 >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Storing new sequence >>>>>>>>> id for >>>>>>>>> ring 2c >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering COMMIT >>>>>>>>> state. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering RECOVERY >>>>>>>>> state. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] position [0] member >>>>>>>>> 10.56.150.4: >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] previous ring seq >>>>>>>>> 40 rep >>>>>>>>> 10.56.150.3 >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] aru 1f4 high >>>>>>>>> delivered >>>>>>>>> 1f4 >>>>>>>>> received flag 1 >>>>>>>>> >>>>>>>>> Message from syslogd@ at Tue Feb 24 21:25:40 2009 ... >>>>>>>>> as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 >>>>>>>>> 21:25:40 >>>>>>>>> as-2 >>>>>>>>> openais[4139]: [TOTEM] Did not need to originate any messages in >>>>>>>>> recovery. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Sending initial >>>>>>>>> ORF token >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION >>>>>>>>> CHANGE >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >>>>>>>>> Feb 24 21:25:40 as-2 clurgmgrd[4194]: <emerg> #1: Quorum >>>>>>>>> Dissolved >>>>>>>>> Feb 24 21:25:40 as-2 kernel: dlm: closing connection to node 1 >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) >>>>>>>>> ip(10.56.150.4) >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) >>>>>>>>> ip(10.56.150.3) >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CMAN ] quorum lost, blocking >>>>>>>>> activity >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION >>>>>>>>> CHANGE >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) >>>>>>>>> ip(10.56.150.4) >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [SYNC ] This node is >>>>>>>>> within the >>>>>>>>> primary component and will provide service. >>>>>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Cluster is not quorate. >>>>>>>>> Refusing >>>>>>>>> connection. >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering OPERATIONAL >>>>>>>>> state. >>>>>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing connect: >>>>>>>>> Connection refused >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] got nodejoin message >>>>>>>>> 10.56.150.4 >>>>>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified >>>>>>>>> (-111). >>>>>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CPG ] got joinlist >>>>>>>>> message from >>>>>>>>> node 2 >>>>>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Someone may be attempting >>>>>>>>> something >>>>>>>>> evil. >>>>>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing get: >>>>>>>>> Invalid >>>>>>>>> request descriptor >>>>>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified >>>>>>>>> (-111). >>>>>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting >>>>>>>>> something >>>>>>>>> evil. >>>>>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing get: >>>>>>>>> Invalid >>>>>>>>> request descriptor >>>>>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Invalid descriptor specified >>>>>>>>> (-21). >>>>>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting >>>>>>>>> something >>>>>>>>> evil. >>>>>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing >>>>>>>>> disconnect: >>>>>>>>> Invalid request descriptor >>>>>>>>> Feb 24 21:25:41 as-2 avahi-daemon[3756]: Withdrawing address >>>>>>>>> record for >>>>>>>>> 10.56.150.144 on eth0. >>>>>>>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: setsockopt >>>>>>>>> (IP_ADD_MEMBERSHIP): >>>>>>>>> Address already in use >>>>>>>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: Failed joining addresse >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I also found there are some errors in as-1's syslog >>>>>>>>> Feb 25 11:27:09 as-1 clurgmgrd[4332]: <err> #52: Failed >>>>>>>>> changing RG >>>>>>>>> status >>>>>>>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> Link for >>>>>>>>> eth0: Not >>>>>>>>> detected >>>>>>>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> No link on >>>>>>>>> eth0... >>>>>>>>> ... >>>>>>>>> Feb 25 11:27:36 as-1 ccsd[4268]: Unable to connect to cluster >>>>>>>>> infrastructure after 30 seconds. >>>>>>>>> ... >>>>>>>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >>>>>>>>> infrastructure after 60 seconds. >>>>>>>>> ... >>>>>>>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >>>>>>>>> infrastructure after 90 seconds. >>>>>>>>> >>>>>>>>> >>>>>>>>> any comment is appreciated! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Linux-cluster mailing list >>>>>>>>> Linux-cluster@xxxxxxxxxx >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> Linux-cluster mailing list >>>>>>>> Linux-cluster@xxxxxxxxxx >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Linux-cluster mailing list >>>>>>> Linux-cluster@xxxxxxxxxx >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> Linux-cluster@xxxxxxxxxx >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>> >>>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster@xxxxxxxxxx >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster