Re: quorum device not getting a vote causes 2-node cluster to be inquorate

bergman@xxxxxxxxxxxx · Tue, 15 Mar 2011 11:42:09 -0400

The pithy ruminations from "Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx> on "Re:  quorum device not getting a vote causes 2-node cluster to be inquorate" were:

=> On 03/15/2011 05:11 AM, bergman@xxxxxxxxxxxx wrote:
=> > I have been using a 2-node cluster with a quorum disk successfully for
=> > about 2 years. Beginning today, the cluster will not boot correctly.
=> > 
=> > The RHCS services start, but fencing fails with:
=> > 	
=> > 	dlm: no local IP address has been set
=> > 	dlm: cannot start dlm lowcomms -107
=> > 
=> > This seems to be a symtpom of the fact that the cluster votes do not include votes from the quorum
=> > device:
=> > 
=> > 	# clustat
=> > 	Cluster Status for example-infra @ Tue Mar 15 00:02:35 2011
=> > 	Member Status: Inquorate
=> > 
=> > 	Member Name                                              ID   Status
=> > 	------ ----                                              ---- ------
=> > 	example-infr2-admin.domain.com                              1 Online, Local
=> > 	example-infr1-admin.domain.com                              2 Offline
=> >         /dev/mpath/quorum                                           0 Offline
=> > 
=> > 	[root@example-infr2 ~]# cman_tool status
=> > 	Version: 6.2.0
=> > 	Config Version: 239
=> > 	Cluster Name: example-infra
=> > 	Cluster Id: 42813
=> > 	Cluster Member: Yes
=> > 	Cluster Generation: 676844
=> > 	Membership state: Cluster-Member
=> > 	Nodes: 1
=> > 	Expected votes: 2
=> > 	Total votes: 1
=> > 	Quorum: 2 Activity blocked
=> > 	Active subsystems: 7
=> > 	Flags: 
=> > 	Ports Bound: 0  
=> > 	Node name: example-infr2-admin.domain.com
=> > 	Node ID: 1
=> > 	Multicast addresses: 239.192.167.228 
=> > 	Node addresses: 192.168.110.3 
=> 
=> You should check the output from cman_tool nodes. It appears that the
=> nodes are not seeing each other at all.

That's correct...at the time I ran cman_tool and clustat, one node was down (deliberately, in an attempt to troubleshoot the issue, but this would also be the case in the event of a hardware failure).

As I see it, the problem is not with the inter-node communication, but with the quorum device. Note that there is only one vote registered--there are no votes from the quorum device. The quorum device should provide sufficient votes to make the "cluster" quorate if only one node is running.

If I understand it correctly, this should also let the "cluster" start with a single node (as long as that node can write to the quorum device). If my understanding is wrong, then how can a 2-node cluster start if one node is down?

=> 
=> The first things I would check are iptables, node names resolves to the
=> correct ip addresses, selinux and eventually if the switch in between
=> the nodes support multicast.

SElinux is disabled (as it has been for the 2 years this cluster has been operational).

There have been no switch changes.

Node names & IPs resolve correctly.

IPtables permits all communication between the "admin" address on the servers.

=> 
=> Fabio
=> 
=> --
=> Linux-cluster mailing list
=> Linux-cluster@xxxxxxxxxx
=> https://www.redhat.com/mailman/listinfo/linux-cluster
=> 

Thanks,

Mark

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster