Cman not start when quorum disk is not available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

we have a RHEL 6.3 cluster of two nodes and a quorum disk.
We are testing the cluster against different failures. We have a problem when the shared storage is disconnected from one of the nodes. The node that has lost contact with the storage is fenced, but when restarting the machine cman will not start up (it will try to start but it will stop):


Jul  9 17:55:54 clnode1p kdump: started up
Jul  9 17:55:54 clnode1p kernel: bond0: no IPv6 routers present
Jul  9 17:55:54 clnode1p kernel: DLM (built Jun 13 2012 18:26:45) installed
Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Successfully parsed cman config Jul 9 17:55:55 clnode1p corosync[2514]: [TOTEM ] Initializing transport (UDP/IP Multicast). Jul 9 17:55:55 clnode1p corosync[2514]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jul 9 17:55:55 clnode1p corosync[2514]: [TOTEM ] The network interface [172.16.255.1] is now up. Jul 9 17:55:55 clnode1p corosync[2514]: [QUORUM] Using quorum provider quorum_cman Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jul 9 17:55:55 clnode1p corosync[2514]: [CMAN ] CMAN 3.0.12.1 (built May 8 2012 12:22:26) started Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90 Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync configuration service Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync profile loading service Jul 9 17:55:55 clnode1p corosync[2514]: [QUORUM] Using quorum provider quorum_cman Jul 9 17:55:55 clnode1p corosync[2514]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Jul 9 17:55:55 clnode1p corosync[2514]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[1]: 1
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[1]: 1
Jul 9 17:55:55 clnode1p corosync[2514]: [CPG ] chosen downlist: sender r(0) ip(172.16.255.1) ; members(old:0 left:0) Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Completed service synchronization, ready to provide service. Jul 9 17:55:55 clnode1p corosync[2514]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jul 9 17:55:55 clnode1p corosync[2514]: [CMAN ] quorum regained, resuming activity Jul 9 17:55:55 clnode1p corosync[2514]: [QUORUM] This node is within the primary component and will provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[2]: 1 2
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[2]: 1 2
Jul 9 17:55:55 clnode1p corosync[2514]: [CPG ] chosen downlist: sender r(0) ip(172.16.255.1) ; members(old:1 left:0) Jul 9 17:55:55 clnode1p corosync[2514]: [MAIN ] Completed service synchronization, ready to provide service.
Jul  9 17:55:59 clnode1p kernel: bond1: no IPv6 routers present
Jul  9 17:55:59 clnode1p qdiskd[2564]: Loading dynamic configuration
Jul  9 17:55:59 clnode1p qdiskd[2564]: Setting votes to 1
Jul  9 17:55:59 clnode1p qdiskd[2564]: Loading static configuration
Jul  9 17:55:59 clnode1p qdiskd[2564]: Timings: 8 tko, 1 interval
Jul 9 17:55:59 clnode1p qdiskd[2564]: Timings: 2 tko_up, 4 master_wait, 2 upgrade_wait Jul 9 17:55:59 clnode1p qdiskd[2564]: Heuristic: '/bin/ping -c1 -w1 clswitch1m' score=1 interval=2 tko=4 Jul 9 17:55:59 clnode1p qdiskd[2564]: Heuristic: '/bin/ping -c1 -w1 clswitch2m' score=1 interval=2 tko=4
Jul  9 17:55:59 clnode1p qdiskd[2564]: 2 heuristics loaded
Jul 9 17:55:59 clnode1p qdiskd[2564]: Quorum Daemon: 2 heuristics, 1 interval, 8 tko, 1 votes
Jul  9 17:55:59 clnode1p qdiskd[2564]: Run Flags: 00000271
Jul  9 17:55:59 clnode1p qdiskd[2564]: stat
Jul 9 17:55:59 clnode1p qdiskd[2564]: qdisk_validate: No such file or directory Jul 9 17:55:59 clnode1p qdiskd[2564]: Specified partition /dev/mapper/apsto1-vd01-v001 does not have a qdisk label Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Unloading all Corosync service engines. Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync configuration service Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync profile loading service Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 Jul 9 17:56:01 clnode1p corosync[2514]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 Jul 9 17:56:01 clnode1p corosync[2514]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1864.


And it will remain in this state even if the storage is reattached later on. So now I have only one functioning node.
What can be done to fix this (to have the cluster framework started)?

Thank you,
Laszlo



--
Acceleris System Integration | and IT works
Laszlo Budai | Technical Consultant
Bvd. Barbu Vacarescu 80 | RO-020282 Bucuresti
t +40 21 23 11 538
laszlo.budai@xxxxxxxxxxxx | www.acceleris.ro
Acceleris Offices are in:
Basel | Bucharest | Zollikofen | Renens | Kloten

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux