Lots of "lost" nodes.

Mark Round <Mark.Round@xxxxxxxxxxxx> · Wed, 23 Oct 2013 09:07:08 +0000

Hi all,

I’m seeing lots of nodes being “lost” in my logs :

Oct 22 16:47:46 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db3 419632812
Oct 22 18:18:48 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db3 419632812
Oct 22 18:47:14 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596
Oct 22 21:17:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596
Oct 23 04:27:48 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596
Oct 23 07:18:47 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596
Oct 23 07:18:50 [10452] s3db1       crmd:     info: do_election_count_vote:  Election 11 (owner: s3db5) lost: vote from s3db5 (Uptime)
Oct 23 07:18:50 [10452] s3db1       crmd:     info: do_election_count_vote:  Election 12 (owner: s3db5) lost: vote from s3db5 (Uptime)
Oct 23 08:10:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596
Oct 23 08:10:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db3 419632812
Oct 23 08:10:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3quorum1 1141053100

This is occurring on a systems running nearly 100% idle on a very quiet 10Gb/s network with no other activity, and no packet loss. Any idea what else could be causing this
 ? I notice on the ClusterLabs wiki, there appear to be tweaked values in the “Initial Configuration” page (http://clusterlabs.org/wiki/Initial_Configuration) which seem to deal with timeouts (see
 below). I have kept everything to the stock defaults, except I’m using udpu. Are these a likely candidate ? Is it recommended that these values should be applied in a production environment ?

Recommended values from Wiki :

            # How long before declaring a token lost (ms)
           token:          5000

            # How many token retransmits before forming a new configuration
           token_retransmits_before_loss_const: 20

            # How long to wait for join messages in the membership protocol (ms)
           join:           1000

            # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
           consensus:      7500

            # Turn off the virtual synchrony filter
           vsftype:        none

            # Number of messages that may be sent by one processor on receipt of the token
           max_messages:   20

            # Disable encryption
           secauth:            off

            # How many threads to use for encryption/decryption
           threads:            0

My configuration :

compatibility: whitetank
totem {
    version: 2
    secauth: off
    interface {
        member {
            memberaddr: 172.22.3.26
        }  

        member {
            memberaddr: 172.22.3.25
        }  

…. And so on…

        ringnumber: 0
        # different on all hosts
        bindnetaddr: 172.22.3.26
        mcastport: 5405
    }
    transport: udpu
}

Many thanks,

-Mark

Mark Round

Senior Systems Administrator

NCC Group

Kings Court Kingston Road

Leatherhead, KT22 7SL

Telephone: +44 1372 383815

Mobile: +44 7790 770413

Fax: 

Website: www.nccgroup.com

Email:  Mark.Round@xxxxxxxxxxxx

This email is sent for and on behalf of NCC Group. NCC Group is the trading name of NCC Group Performance Testing Limited (Registered in England CRN: 4069379). Registered Office: Manchester Technology Centre, Oxford Road, Manchester, M1 7EF. The ultimate holding
 company is NCC Group plc (Registered in England CRN: 4627044). 

Confidentiality: This e-mail contains proprietary information, some or all of which may be confidential and/or legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author
 by replying to this e-mail and then delete the original. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on any information contained in this e-mail. You must not inform any other person other than NCC Group
 or the sender of its existence. 

For more information about NCC Group please visit 
www.nccgroup.com

P Before you print think about the ENVIRONMENT

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss