Ok. What is your iptables you are using to block traffic? Please make sure to keep at least lo working, and block BOTH side (so INPUT and OUTPUT), so something like: iptables -A INPUT ! -i lo -p udp -j DROP && iptables -A OUTPUT ! -o lo -p udp -j DROP works usually well. Honza Mark Round napsal(a): > Same behaviour. I switched to CentOS 6.4 provided Pacemaker,Corosync and CMAN. I configured a CMAN cluster of 4 nodes, and split one node off via iptables DROP. > > It now it looks like this on the 3 nodes in one partition : > > # corosync-quorumtool -s > Version: 1.4.1 > Nodes: 3 > Ring ID: 1152 > Quorum type: quorum_cman > Quorate: No > > However, on the one victim node, it still thinks it has quorum after I drop everything with iptables : > > # corosync-quorumtool -s > Version: 1.4.1 > Nodes: 4 > Ring ID: 1148 > Quorum type: quorum_cman > Quorate: Yes > > Note the old ring ID on the victim node. When I allow traffic again, both partitions merge and I get a new Ring ID : > > # corosync-quorumtool -s > Version: 1.4.1 > Nodes: 4 > Ring ID: 1160 > Quorum type: quorum_cman > Quorate: Yes > > So, it's the same behaviour. One node on it's own cannot seem to decide that it is in a partition on it's own... > > -----Original Message----- > From: Jan Friesse [mailto:jfriesse@xxxxxxxxxx] > Sent: 05 September 2013 13:10 > To: Mark Round; discuss@xxxxxxxxxxxx > Subject: Re: Corosync quorum not updating on split node > > Mark, > quorum in 1.4.x have some problems (this may be one of them) and that's why it was completely rewritten in 2.x. > > Can you please try to use cman and it's quorum module? Cman quorum is well tested and should work. > > Regards, > Honza > > Mark Round napsal(a): >> Hi all, >> >> I have a problem whereby when I create a network split/partition (by dropping traffic with iptables), the victim node for some reason does not realise it has split from the network. If I split a cluster into two partitions both with multiple nodes, one with quorum and one without, then things function as expected; it just appears that a single node on it's own can't work out that it doesn't have quorum if it has no other nodes to talk to. >> >> A single victim node seems to recognise that it can't form a cluster due to network issues, but the status is not reflected in the output from corosync-quorumtool, and cluster services (via pacemaker) still continue to run. However, the other nodes in the rest of the cluster do realise they have lost contact with a node, no longer have quorum and correctly shut down services. >> >> When I block traffic on the victim node's eth0, The remaining nodes see that they cannot communicate with it and shutdown : >> >> # corosync-quorumtool -s >> Version: 1.4.5 >> Nodes: 3 >> Ring ID: 696 >> Quorum type: corosync_votequorum >> Quorate: No >> Node votes: 1 >> Expected votes: 7 >> Highest expected: 7 >> Total votes: 3 >> Quorum: 4 Activity blocked >> Flags: >> >> However, the victim node still thinks everything is fine, and maintains a view of the cluster prior to the split : >> >> # corosync-quorumtool -s >> Version: 1.4.5 >> Nodes: 4 >> Ring ID: 716 >> Quorum type: corosync_votequorum >> Quorate: Yes >> Node votes: 1 >> Expected votes: 7 >> Highest expected: 7 >> Total votes: 4 >> Quorum: 4 >> Flags: Quorate >> >> However, it does notice in the logs that it cannot now form cluster, as the following messages repeat constantly : >> >> corosync [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. >> >> I would expect at this point for it to be in it's own network partition with a total of 1 vote, and block activity. However, this does not seem to happen until just after it rejoins the cluster. When I unblock traffic and it rejoins, I see the victim finally realise it had lost quorum : >> >> Sep 05 09:52:21 corosync [pcmk ] notice: pcmk_peer_update: >> Transitional membership event on ring 720: memb=1, new=0, lost=3 Sep >> 05 09:52:21 corosync [VOTEQ ] quorum lost, blocking activity Sep 05 09:52:21 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services. >> Sep 05 09:52:21 corosync [QUORUM] Members[1]: 358898186 >> >> And a second or so later it regains quorum : >> >> crmd: notice: ais_dispatch_message: Membership 736: quorum acquired >> >> So my question is why, when it realises it cannot form a cluster ("Totem in unable to form..."), does it not loose quorum, update the status as reflected by quorumtool and shutdown cluster services ? >> >> Configuration file example and package versions/environment listed below. I'm using "updu" protocol as we need to avoid multicast in this environment; it will eventually be using a routed network. This behaviour also persists when I disable the pacemaker plugin and just test with corosync. >> >> compatibility: whitetank >> totem { >> version: 2 >> secauth: off >> interface { >> member { >> memberaddr: 10.90.100.20 >> } >> member { >> memberaddr: 10.90.100.21 >> } >> ... >> ... more nodes snipped >> ... >> ringnumber: 0 >> bindnetaddr: 10.90.100.20 >> mcastport: 5405 >> } >> transport: udpu >> } >> amf { >> mode: disabled >> } >> aisexec { >> user: root >> group: root >> } >> quorum { >> provider: corosync_votequorum >> expected_votes: 7 >> } >> service { >> # Load the Pacemaker Cluster Resource Manager >> name: pacemaker >> ver: 0 >> } >> >> Environment : CentOS 6.4 >> Packages from OpenSUSE : >> http://download.opensuse.org/repositories/network:/ha-clustering:/Stab >> le/RedHat_RHEL-6/x86_64/ # rpm -qa | egrep >> "^(cluster|corosync|crm|libqb|pacemaker|resource-agents)" | sort >> cluster-glue-1.0.11-3.1.x86_64 >> cluster-glue-libs-1.0.11-3.1.x86_64 >> corosync-1.4.5-2.2.x86_64 >> corosynclib-1.4.5-2.2.x86_64 >> crmsh-1.2.6-0.rc3.3.1.x86_64 >> libqb0-0.14.4-1.2.x86_64 >> pacemaker-1.1.9-2.1.x86_64 >> pacemaker-cli-1.1.9-2.1.x86_64 >> pacemaker-cluster-libs-1.1.9-2.1.x86_64 >> pacemaker-libs-1.1.9-2.1.x86_64 >> resource-agents-3.9.5-3.1.x86_64 >> >> Regards, >> >> -Mark >> >> ________________________________ >> >> Mark Round >> Senior Systems Administrator >> NCC Group >> Kings Court >> Kingston Road >> Leatherhead, KT22 7SL >> >> Telephone: +44 1372 383815 >> Mobile: +44 7790 770413 >> Fax: >> Website: www.nccgroup.com<http://www.nccgroup.com> >> Email: Mark.Round@xxxxxxxxxxxx<mailto:Mark.Round@xxxxxxxxxxxx> >> [http://www.nccgroup.com/media/192418/nccgrouplogo.jpg] >> <http://www.nccgroup.com/> ________________________________ >> >> This email is sent for and on behalf of NCC Group. NCC Group is the trading name of NCC Group Performance Testing Limited (Registered in England CRN: 4069379). Registered Office: Manchester Technology Centre, Oxford Road, Manchester, M1 7EF. The ultimate holding company is NCC Group plc (Registered in England CRN: 4627044). >> >> Confidentiality: This e-mail contains proprietary information, some or all of which may be confidential and/or legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author by replying to this e-mail and then delete the original. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on any information contained in this e-mail. You must not inform any other person other than NCC Group or the sender of its existence. >> >> For more information about NCC Group please visit >> www.nccgroup.com<http://www.nccgroup.com> >> >> P Before you print think about the ENVIRONMENT >> >> >> For more information please visit <a >> href="http://www.mimecast.com">http://www.mimecast.com<br> >> This email message has been delivered safely and archived online by Mimecast. >> </a> >> >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss >> > > > ________________________________ > > Mark Round > Senior Systems Administrator > NCC Group > Kings Court > Kingston Road > Leatherhead, KT22 7SL > > Telephone: +44 1372 383815 > Mobile: +44 7790 770413 > Fax: > Website: www.nccgroup.com<http://www.nccgroup.com> > Email: Mark.Round@xxxxxxxxxxxx<mailto:Mark.Round@xxxxxxxxxxxx> > [http://www.nccgroup.com/media/192418/nccgrouplogo.jpg] <http://www.nccgroup.com/> > ________________________________ > > This email is sent for and on behalf of NCC Group. NCC Group is the trading name of NCC Group Performance Testing Limited (Registered in England CRN: 4069379). Registered Office: Manchester Technology Centre, Oxford Road, Manchester, M1 7EF. The ultimate holding company is NCC Group plc (Registered in England CRN: 4627044). > > Confidentiality: This e-mail contains proprietary information, some or all of which may be confidential and/or legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author by replying to this e-mail and then delete the original. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on any information contained in this e-mail. You must not inform any other person other than NCC Group or the sender of its existence. > > For more information about NCC Group please visit www.nccgroup.com<http://www.nccgroup.com> > > P Before you print think about the ENVIRONMENT > > > For more information please visit <a href="http://www.mimecast.com">http://www.mimecast.com<br> > This email message has been delivered safely and archived online by Mimecast. > </a> > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss