Re: Cman (and corosync) starting before network interface is ready

"Facundo M. de la Cruz" <fmdlc.unix@xxxxxxxxx> · Thu, 18 Sep 2014 10:47:28 -0300

On Sep 18, 2014, at 10:25, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote:

> On 18/09/14 14:09, Vallevand, Mark K wrote:
>> Hmmm.  I'm still curious what two_node exactly does.
>> 
>> In my testing, the clustering software comes up before the network is completely ready.  (Why?  That's another day.)
>> 
>> With just no-quorum-policy=ignore, regardless of the fence_join_delay value, the rebooted node fences the other node and starts up all split-brain.  It takes about 30 seconds or so after the network is ready for the split brain to be detected.
>> 
>> With no-quorum-policy=ignore and two_node="1" expected_votes="1", regardless of the fence_join_delay value, the rebooted node fences the other node, but as soon as the network is ready the other node joins the network and there is no split-brain.
>> 
>> I'm happy that things are working, but I'm still curious for some idea about what two_node does.
>> 
>> 
> 
> two_node is simply to allow a 2 node cluster to remain quorate when one node is unavailable - it's a special case that allows the cluster to remain running when quorum is 1. It requires hardware fencing to make sure that one node is fenced and can't do any harm to the remaining node. It's nothing more complicated than that.
> 
> If <fence_daemon post_join_delay="x"/> is set (and this is not directly part of two_node, but useful to know for all clusters) then the other node will not be fenced for x seconds after the first node starts up, which should take care of your fence trouble.
> 
> Sorry, I misnamed the parameter in my first email.
> 
> Chrissie
> 
>> Regards.
>> Mark K Vallevand
>> 
>> "If there are no dogs in Heaven, then when I die I want to go where they went."
>> -Will Rogers
>> 
>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
>> 
>> 
>> -----Original Message-----
>> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Christine Caulfield
>> Sent: Thursday, September 18, 2014 03:33 AM
>> To: Andrew Beekhof
>> Cc: linux clustering
>> Subject: Re:  Cman (and corosync) starting before network interface is ready
>> 
>> On 18/09/14 09:29, Andrew Beekhof wrote:
>>> 
>>> On 18 Sep 2014, at 6:18 pm, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote:
>>> 
>>>> On 18/09/14 02:35, Andrew Beekhof wrote:
>>>>> 
>>>>> On 18 Sep 2014, at 12:34 am, Vallevand, Mark K <Mark.Vallevand@xxxxxxxxxx> wrote:
>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> 1. I didn't know about two-node mode.  Thanks.  We are testing with two nodes and "crm configure property no-quorum-policy=ignore".  When one node goes down, the other node continues clustering.  This is the desired behavior.  What will <cman two_node="1" expected_votes="1"> </cman> in cluster.conf do?
>>>>> 
>>>>> I was all set to be a smart-ass and say 'man cluster.conf', but the joke is on me as my colleagues do not appear to have documented it anywhere.
>>>>> Chrissie: Can you elaborate on the details here please?
>>>>> 
>>>> 
>>>> it's documented in the cman(5) man page. The entries in cluster.conf only cover the general parts that are not specific to any subsystem. So corosync items are documented in the corosync man page and cman ones in the cman man page etc.
>>> 
>>> Ah! Good to know.
>>> 
>>>         Two node clusters
>>>                Ordinarily,  the loss of quorum after one out of two nodes fails will prevent the remaining node from continuing (if both nodes have one vote.)  Special configuration options can be set to allow the one remaining node to continue operating if the other
>>>                fails.  To do this only two nodes, each with one vote, can be defined in cluster.conf.  The two_node and expected_votes values must then be set to 1 in the cman section as follows.
>>> 
>>>                  <cman two_node="1" expected_votes="1">
>>>                  </cman>
>>> 
>>> One thing thats not clear to me is what happens when a single node comes up and can only see itself.
>>> Does it get quorum or is it like wait-for-all in corosync2?
>>> 
>> 
>> 
>> There's no wait_for_all in cman. The first node up will attempt (after
>> fence_join_delay) the other node in an attempt to stop a split brain.
>> 
>> This is one of several reasons why we insist that the fencing is on a
>> separate network to heartbeat on a two_node cluster.
>> 
>> 
>> Chrissie
>> 
>>>> 
>>>> Chrissie
>>>> 
>>>> 
>>>>> (Short version, it should do what you want)
>>>>> 
>>>>>> 2. Yes, fencing is part of our plan, but not at this time.  In the configurations we are testing, fencing is a RFPITA.
>>>>>> 3. We could move up.  We like Ubuntu 12.04 LTS because it is Long Term Support.  But, we've upgraded packages as necessary.  So, if we move to the latest stable Pacemaker, Cman and Corosync (and others?), how could this help?
>>>>> 
>>>>> Well you might get 3+ years of bug fixes and performance improvements :-)
>>>>> 
>>>>>> 
>>>>>> Is there a way to get the clustering software to 'poll' faster?  I mean, this NIC stalling at boot time only lasts about 2 seconds beyond the start of corosync.  But, its 30 more seconds before the nodes see each other.  I see lots of parameters in the totem directive that seem interesting.  Would any of them be appropriate.
>>>>> 
>>>>> Is there not a way to tell upstart not to start the cluster until the network is up?
>>>>> 
>>>>>> 
>>>>>> Andrew: Thanks for the prompt response.
>>>>>> 
>>>>>> 
>>>>>> Regards.
>>>>>> Mark K Vallevand
>>>>>> 
>>>>>> "If there are no dogs in Heaven, then when I die I want to go where they went."
>>>>>> -Will Rogers
>>>>>> 
>>>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Andrew Beekhof
>>>>>> Sent: Tuesday, September 16, 2014 08:51 PM
>>>>>> To: linux clustering
>>>>>> Subject: Re:  Cman (and corosync) starting before network interface is ready
>>>>>> 
>>>>>> 
>>>>>> On 17 Sep 2014, at 7:20 am, Vallevand, Mark K <Mark.Vallevand@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>>> It looks like there is some odd delay in getting a network interface up and ready.  So, when cman starts corosync, it can't get to the cluster.  So, for a time, the node is a member of a cluster-of-one.  The cluster-of-one begins starting resources.
>>>>>> 
>>>>>> 1. enable two-node mode in cluster.conf (man page should indicate where/how) then disable no-quorum-policy=ignore
>>>>>> 2. configure fencing
>>>>>> 3. find a newer version of pacemaker, we're up to .12 now
>>>>>> 
>>>>>>> A few seconds later, when the interface finally is up and ready, it takes about 30 more seconds for the cluster-of-one to finally rejoin the larger cluster.  The doubly-started resources are sorted out and all ends up OK.
>>>>>>> 
>>>>>>> Now, this is not a good thing to have these particular resources running twice.  I'd really like the clustering software to behave better.  But, I'm not sure what 'behave better' would be.
>>>>>>> 
>>>>>>> Is it possible to introduce a delay into cman or corosync startup?  Is that even wise?
>>>>>>> Is there a parameter to get the clustering software to poll more often when it can't rejoin the cluster?
>>>>>>> 
>>>>>>> Any suggestions would be welcome.
>>>>>>> 
>>>>>>> Running Ubuntu 12.04 LTS.  Pacemaker 1.1.6.  Cman 3.1.7.  Corosync 1.4.2.
>>>>>>> 
>>>>>>> Regards.
>>>>>>> Mark K Vallevand
>>>>>>> "If there are no dogs in Heaven, then when I die I want to go where they went."
>>>>>>> -Will Rogers
>>>>>>> 
>>>>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster@xxxxxxxxxx
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster@xxxxxxxxxx
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> 
>>>> 
>>> 
>> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi, let me add something more:
The “fence_join_delay" option is used to avoid a condition called "dual fencing” which can leaves your cluster entirely powered down.
"Dual fencing" can suscced if you have a two nodes cluster with no quorum and you are using fencing through  IPMI agents. So both nodes would try to execute fencing actions against each other using the IPMI interface. 
“fence_join_delay” works adding a countdown to avoid to the remain node execute fencing actions for X time, just the first node can send IPMI command to the other one.

Best regards.

-- 
Facundo M. de la Cruz (tty0)
Information Technology Specialist
Movil: +54 911 56528301

http://codigounix.blogspot.com/
http://twitter.com/_tty0

GPG fingerprint: DF2F 514A 5167 00F5 C753 BF3B D797 C8E1 5726 0789

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.” - Rich Cook

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster