Re: Cman (and corosync) starting before network interface is ready

"Facundo M. de la Cruz" <fmdlc.unix@xxxxxxxxx> · Wed, 17 Sep 2014 17:44:06 -0300

On Sep 17, 2014, at 17:35, Vallevand, Mark K <Mark.Vallevand@xxxxxxxxxx> wrote:

> WooHoo.
> 
> I added:
>  <cman two_node="1" expected_votes="1"> 
>  </cman>
> in cluster.conf and I think it's working.
> 
> So, what does the two_node do?
> 
> And, a follow up question:
> What will happen if crm configure property no-quorum-policy=ignore" is set on clusters with more than 2 nodes?
> Should I skip that on clusters with more than two nodes?
> 
> 
> Regards.
> Mark K Vallevand
> 
> "If there are no dogs in Heaven, then when I die I want to go where they went." 
> -Will Rogers
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Vallevand, Mark K
> Sent: Wednesday, September 17, 2014 02:07 PM
> To: linux clustering
> Subject: Re:  Cman (and corosync) starting before network interface is ready
> 
> Oops.  In number 2, I read fencing as STONITH.  My bad.
> I think some form of fencing is configured.
> My cluster.conf file has this in it:
>  <fencedevices>
>     <fencedevice name="pcmk" agent="fence_pcmk"/>
>   </fencedevices>
> Does that configure fencing?
> 
> I'm considering adding this to the cluster.conf:
>  <fence_daemon post_join_delay="60">
>  </fence_daemon>
> This raises the initial join delay when clustering starts.  Default is 6
> seconds.  6 seconds kind of matches what I am seeing when clustering starts
> and the NIC link is slow to go up.
> 
> 
> Regards.
> Mark K Vallevand
> 
> "If there are no dogs in Heaven, then when I die I want to go where they went." 
> -Will Rogers
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Vallevand, Mark K
> Sent: Wednesday, September 17, 2014 09:35 AM
> To: linux clustering
> Subject: Re:  Cman (and corosync) starting before network interface is ready
> 
> Thanks.
> 
> 1. I didn't know about two-node mode.  Thanks.  We are testing with two nodes and "crm configure property no-quorum-policy=ignore".  When one node goes down, the other node continues clustering.  This is the desired behavior.  What will <cman two_node="1" expected_votes="1"> </cman> in cluster.conf do?
> 2. Yes, fencing is part of our plan, but not at this time.  In the configurations we are testing, fencing is a RFPITA.
> 3. We could move up.  We like Ubuntu 12.04 LTS because it is Long Term Support.  But, we've upgraded packages as necessary.  So, if we move to the latest stable Pacemaker, Cman and Corosync (and others?), how could this help?
> 
> Is there a way to get the clustering software to 'poll' faster?  I mean, this NIC stalling at boot time only lasts about 2 seconds beyond the start of corosync.  But, its 30 more seconds before the nodes see each other.  I see lots of parameters in the totem directive that seem interesting.  Would any of them be appropriate.
> 
> Andrew: Thanks for the prompt response.
> 
> 
> Regards.
> Mark K Vallevand
> 
> "If there are no dogs in Heaven, then when I die I want to go where they went." 
> -Will Rogers
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Andrew Beekhof
> Sent: Tuesday, September 16, 2014 08:51 PM
> To: linux clustering
> Subject: Re:  Cman (and corosync) starting before network interface is ready
> 
> 
> On 17 Sep 2014, at 7:20 am, Vallevand, Mark K <Mark.Vallevand@xxxxxxxxxx> wrote:
> 
>> It looks like there is some odd delay in getting a network interface up and ready.  So, when cman starts corosync, it can't get to the cluster.  So, for a time, the node is a member of a cluster-of-one.  The cluster-of-one begins starting resources.
> 
> 1. enable two-node mode in cluster.conf (man page should indicate where/how) then disable no-quorum-policy=ignore
> 2. configure fencing
> 3. find a newer version of pacemaker, we're up to .12 now
> 
>> A few seconds later, when the interface finally is up and ready, it takes about 30 more seconds for the cluster-of-one to finally rejoin the larger cluster.  The doubly-started resources are sorted out and all ends up OK.
>> 
>> Now, this is not a good thing to have these particular resources running twice.  I'd really like the clustering software to behave better.  But, I'm not sure what 'behave better' would be.
>> 
>> Is it possible to introduce a delay into cman or corosync startup?  Is that even wise?
>> Is there a parameter to get the clustering software to poll more often when it can't rejoin the cluster?
>> 
>> Any suggestions would be welcome.
>> 
>> Running Ubuntu 12.04 LTS.  Pacemaker 1.1.6.  Cman 3.1.7.  Corosync 1.4.2.
>> 
>> Regards. 
>> Mark K Vallevand
>> "If there are no dogs in Heaven, then when I die I want to go where they went."
>> -Will Rogers
>> 
>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

The option two_nodes="1" tells the cluster manager to continue operating with only one vote. 
This option requires that the expected_votes="" attribute be set to 1, because is you lost one cluster node, you have the another node running yet.
Normally, expected_votes is set automatically to the total sum of the defined cluster nodes' votes (which itself is a default of 1). 

Regards.

-- 
Facundo M. de la Cruz (tty0)
Information Technology Specialist
Movil: +54 911 56528301

http://codigounix.blogspot.com/
http://twitter.com/_tty0

GPG fingerprint: DF2F 514A 5167 00F5 C753 BF3B D797 C8E1 5726 0789

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.” - Rich Cook

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster