On Wed, Feb 25, 2009 at 11:45 AM, Mockey Chen <mockey.chen@xxxxxxx> wrote: > ext Kein He wrote: >> I think there is a problem, from "cman_tool status" shows: >> >> Nodes: 2 >> Expected votes: 3 >> Total votes: 2 >> >> >> according to your cluster.conf , if all nodes and qdisk are online, >> the "Total votes" must be "3". Probably "qdiskd" is not running, you >> can use " cman_tool nodes" to check if qdisk is working. >> > Yes, here is "cman_tool nodes" output: > Node Sts Inc Joined Name > 1 M 112 2009-02-25 03:05:19 as-1.localdomain > 2 M 104 2009-02-25 03:05:19 as-2.localdomain > > A question is how to check whether qdisk is running ? and how to run it ? [root@blade3 ~]# service qdiskd status qdiskd (pid 2832) is running... [root@blade3 ~]# pgrep qdisk -l 2832 qdiskd [root@blade3 ~]# cman_tool nodes Node Sts Inc Joined Name 0 M 0 2009-02-19 16:11:55 /dev/sda5 ## This is qdisk. 1 M 1524 2009-02-20 22:27:32 blade1 2 M 1552 2009-02-24 04:39:24 blade2 3 M 1500 2009-02-19 16:11:03 blade3 4 M 1516 2009-02-19 16:11:22 blade4 You can use "service qdisk start" to start it, or run it with /usr/sbin/qdisk -Q if you dont have the init script. If you installed from rpm on a rh type distro, then the script should be there. REgards, brett > > Thanks. >> >> >> >> Mockey Chen wrote: >>> ext Mockey Chen wrote: >>> >>>> ext Kein He wrote: >>>> >>>>> Hi Mockey, >>>>> >>>>> Could you please attach the output from " cman_tool status " and " >>>>> cman_tool nodes -f" ? >>>>> >>>>> >>>> Thanks your response. >>>> >>>> I try to run cman_tool status on as-2, but it hang, without output, and >>>> even Ctrl+C also no effect. >>>> >>> I manually reboot as-1, and the problem solved. >>> >>> There is the output of cman_tool >>> >>> [root@as-1 ~]# cman_tool status >>> Version: 6.1.0 >>> Config Version: 19 >>> Cluster Name: azerothcluster >>> Cluster Id: 20148 >>> Cluster Member: Yes >>> Cluster Generation: 76 >>> Membership state: Cluster-Member >>> Nodes: 2 >>> Expected votes: 3 >>> Total votes: 2 >>> Quorum: 2 Active subsystems: 8 >>> Flags: Dirty >>> Ports Bound: 0 177 Node name: as-1.localdomain >>> Node ID: 1 >>> Multicast addresses: 239.192.78.3 >>> Node addresses: 10.56.150.3 >>> [root@as-1 ~]# cman_tool status -f >>> Version: 6.1.0 >>> Config Version: 19 >>> Cluster Name: azerothcluster >>> Cluster Id: 20148 >>> Cluster Member: Yes >>> Cluster Generation: 76 >>> Membership state: Cluster-Member >>> Nodes: 2 >>> Expected votes: 3 >>> Total votes: 2 >>> Quorum: 2 Active subsystems: 8 >>> Flags: Dirty >>> Ports Bound: 0 177 Node name: as-1.localdomain >>> Node ID: 1 >>> Multicast addresses: 239.192.78.3 >>> Node addresses: 10.56.150.3 >>> >>> >>> It seems cluster can not fence one of the node. How to solve it ? >>> >>> >>>> I open a new window and can using ssh to as-2, but after login, I can >>>> not do anything, even a >>>> simple 'ls' command is hung. >>>> >>>> It seem the system keep alive but do not provide any service. Really >>>> bad. >>>> >>>> Any way to debug this issue ? >>>> >>>>> Mockey Chen wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have a two-nodes cluster, to avoid split-brain. I use ilo as fence >>>>>> device, IP tiebreaker. here is my /etc/cluster/cluster.conf >>>>>> <?xml version="1.0"?> >>>>>> <cluster alias="azerothcluster" config_version="19" >>>>>> name="azerothcluster"> >>>>>> <cman expected_votes="3" two_node="0"/> >>>>>> <clusternodes> >>>>>> <clusternode name="as-1.localdomain" nodeid="1" votes="1"> >>>>>> <fence> >>>>>> <method name="1"> >>>>>> <device name="ilo1"/> >>>>>> </method> >>>>>> </fence> >>>>>> </clusternode> >>>>>> <clusternode name="as-2.localdomain" nodeid="2" votes="1"> >>>>>> <fence> >>>>>> <method name="1"> >>>>>> <device name="ilo2"/> >>>>>> </method> >>>>>> </fence> >>>>>> </clusternode> >>>>>> </clusternodes> >>>>>> <quorumd interval="1" tko="10" votes="1" label="pingtest"> >>>>>> <heuristic program="ping 10.56.150.1 -c1 -t1" >>>>>> score="1" >>>>>> interval="2" tko="3"/> >>>>>> </quorumd> >>>>>> <fence_daemon post_fail_delay="0" post_join_delay="3"/> >>>>>> <fencedevices> >>>>>> <fencedevice agent="fence_ilo" hostname="10.56.154.18" >>>>>> login="power" name="ilo1" passwd="pass"/> >>>>>> <fencedevice agent="fence_ilo" hostname="10.56.154.19" >>>>>> login="power" name="ilo2" passwd="pass"/> >>>>>> </fencedevices> >>>>>> ... >>>>>> ... >>>>>> >>>>>> To test one node lost heartbeat case, I disable ethereal card >>>>>> (eth0) on >>>>>> as-1, I expect as-2 takeover services on as-1 and as-1 node reboot. >>>>>> The actual is as-1 lost connection to as-2. as-2 detected it and >>>>>> try to >>>>>> re-construct cluster, but failed, here is the syslog form as-2 >>>>>> >>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] The token was lost in the >>>>>> OPERATIONAL state. >>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Receive multicast socket >>>>>> recv buffer size (288000 bytes). >>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Transmit multicast socket >>>>>> send buffer size (262142 bytes). >>>>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] entering GATHER state >>>>>> from 2. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering GATHER state >>>>>> from 0. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Creating commit token >>>>>> because I am the rep. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Saving state aru 1f4 high >>>>>> seq received 1f4 >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Storing new sequence >>>>>> id for >>>>>> ring 2c >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering COMMIT state. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering RECOVERY state. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] position [0] member >>>>>> 10.56.150.4: >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] previous ring seq 40 rep >>>>>> 10.56.150.3 >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] aru 1f4 high delivered >>>>>> 1f4 >>>>>> received flag 1 >>>>>> >>>>>> Message from syslogd@ at Tue Feb 24 21:25:40 2009 ... >>>>>> as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 21:25:40 >>>>>> as-2 >>>>>> openais[4139]: [TOTEM] Did not need to originate any messages in >>>>>> recovery. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Sending initial ORF token >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >>>>>> Feb 24 21:25:40 as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved >>>>>> Feb 24 21:25:40 as-2 kernel: dlm: closing connection to node 1 >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.3) >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CMAN ] quorum lost, blocking >>>>>> activity >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration: >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4) >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left: >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined: >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [SYNC ] This node is within the >>>>>> primary component and will provide service. >>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Cluster is not quorate. Refusing >>>>>> connection. >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering OPERATIONAL >>>>>> state. >>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing connect: >>>>>> Connection refused >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] got nodejoin message >>>>>> 10.56.150.4 >>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). >>>>>> Feb 24 21:25:40 as-2 openais[4139]: [CPG ] got joinlist message from >>>>>> node 2 >>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Someone may be attempting something >>>>>> evil. >>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing get: Invalid >>>>>> request descriptor >>>>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111). >>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something >>>>>> evil. >>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing get: Invalid >>>>>> request descriptor >>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Invalid descriptor specified (-21). >>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something >>>>>> evil. >>>>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing disconnect: >>>>>> Invalid request descriptor >>>>>> Feb 24 21:25:41 as-2 avahi-daemon[3756]: Withdrawing address >>>>>> record for >>>>>> 10.56.150.144 on eth0. >>>>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: setsockopt (IP_ADD_MEMBERSHIP): >>>>>> Address already in use >>>>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: Failed joining addresse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I also found there are some errors in as-1's syslog >>>>>> Feb 25 11:27:09 as-1 clurgmgrd[4332]: <err> #52: Failed changing RG >>>>>> status >>>>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> Link for eth0: Not >>>>>> detected >>>>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> No link on eth0... >>>>>> ... >>>>>> Feb 25 11:27:36 as-1 ccsd[4268]: Unable to connect to cluster >>>>>> infrastructure after 30 seconds. >>>>>> ... >>>>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >>>>>> infrastructure after 60 seconds. >>>>>> ... >>>>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster >>>>>> infrastructure after 90 seconds. >>>>>> >>>>>> >>>>>> any comment is appreciated! >>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> Linux-cluster@xxxxxxxxxx >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster@xxxxxxxxxx >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster