Cman not start when quorum disk is not available

Laszlo Budai <laszlo.budai@xxxxxxxxxxxx> · Thu, 10 Jul 2014 14:49:09 +0300

Dear all,

we have a RHEL 6.3 cluster of two nodes and a quorum disk.
We are testing the cluster against different failures. We have a problem 
when the shared storage is disconnected from one of the nodes. The node 
that has lost contact with the storage is fenced, but when restarting 
the machine cman will not start up (it will try to start but it will stop):

Jul  9 17:55:54 clnode1p kdump: started up
Jul  9 17:55:54 clnode1p kernel: bond0: no IPv6 routers present
Jul  9 17:55:54 clnode1p kernel: DLM (built Jun 13 2012 18:26:45) installed
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Corosync Cluster 
Engine ('1.4.1'): started and ready to provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Corosync built-in 
features: nss dbus rdma snmp
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Successfully read 
config from /etc/cluster/cluster.conf
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Successfully parsed 
cman config
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] Initializing 
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] The network 
interface [172.16.255.1] is now up.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Using quorum 
provider quorum_cman
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1
Jul  9 17:55:55 clnode1p corosync[2514]:   [CMAN  ] CMAN 3.0.12.1 (built 
May  8 2012 12:22:26) started
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync CMAN membership service 2.90
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: openais checkpoint service B.01.01
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync extended virtual synchrony service
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync configuration service
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster closed process group service v1.01
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster config database access v1.01
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync profile loading service
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Using quorum 
provider quorum_cman
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Compatibility mode 
set to whitetank.  Using V1 and V2 of the synchronization engine.
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] A processor joined 
or left the membership and a new membership was formed.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[1]: 1
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[1]: 1
Jul  9 17:55:55 clnode1p corosync[2514]:   [CPG   ] chosen downlist: 
sender r(0) ip(172.16.255.1) ; members(old:0 left:0)
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] A processor joined 
or left the membership and a new membership was formed.
Jul  9 17:55:55 clnode1p corosync[2514]:   [CMAN  ] quorum regained, 
resuming activity
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] This node is within 
the primary component and will provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[2]: 1 2
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[2]: 1 2
Jul  9 17:55:55 clnode1p corosync[2514]:   [CPG   ] chosen downlist: 
sender r(0) ip(172.16.255.1) ; members(old:1 left:0)
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Jul  9 17:55:59 clnode1p kernel: bond1: no IPv6 routers present
Jul  9 17:55:59 clnode1p qdiskd[2564]: Loading dynamic configuration
Jul  9 17:55:59 clnode1p qdiskd[2564]: Setting votes to 1
Jul  9 17:55:59 clnode1p qdiskd[2564]: Loading static configuration
Jul  9 17:55:59 clnode1p qdiskd[2564]: Timings: 8 tko, 1 interval
Jul  9 17:55:59 clnode1p qdiskd[2564]: Timings: 2 tko_up, 4 master_wait, 
2 upgrade_wait
Jul  9 17:55:59 clnode1p qdiskd[2564]: Heuristic: '/bin/ping -c1 -w1 
clswitch1m' score=1 interval=2 tko=4
Jul  9 17:55:59 clnode1p qdiskd[2564]: Heuristic: '/bin/ping -c1 -w1 
clswitch2m' score=1 interval=2 tko=4
Jul  9 17:55:59 clnode1p qdiskd[2564]: 2 heuristics loaded
Jul  9 17:55:59 clnode1p qdiskd[2564]: Quorum Daemon: 2 heuristics, 1 
interval, 8 tko, 1 votes
Jul  9 17:55:59 clnode1p qdiskd[2564]: Run Flags: 00000271
Jul  9 17:55:59 clnode1p qdiskd[2564]: stat
Jul  9 17:55:59 clnode1p qdiskd[2564]: qdisk_validate: No such file or 
directory
Jul  9 17:55:59 clnode1p qdiskd[2564]: Specified partition 
/dev/mapper/apsto1-vd01-v001 does not have a qdisk label
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Unloading all 
Corosync service engines.
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync extended virtual synchrony service
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync configuration service
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync cluster closed process group service v1.01
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync cluster config database access v1.01
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync profile loading service
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: openais checkpoint service B.01.01
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync CMAN membership service 2.90
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync cluster quorum service v0.1
Jul  9 17:56:01 clnode1p corosync[2514]:   [MAIN  ] Corosync Cluster 
Engine exiting with status 0 at main.c:1864.

And it will remain in this state even if the storage is reattached later 
on.  So now I have only one functioning node.
What can be done to fix this (to have the cluster framework started)?

Thank you,
Laszlo

--
Acceleris System Integration | and IT works

Laszlo Budai | Technical Consultant
Bvd. Barbu Vacarescu 80 | RO-020282 Bucuresti
t +40 21 23 11 538
laszlo.budai@xxxxxxxxxxxx | www.acceleris.ro

Acceleris Offices are in:
Basel | Bucharest | Zollikofen | Renens | Kloten

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster