Lots of "lost" nodes.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

 

I’m seeing lots of nodes being “lost” in my logs :

 

Oct 22 16:47:46 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db3 419632812

Oct 22 18:18:48 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db3 419632812

Oct 22 18:47:14 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596

Oct 22 21:17:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596

Oct 23 04:27:48 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596

Oct 23 07:18:47 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596

Oct 23 07:18:50 [10452] s3db1       crmd:     info: do_election_count_vote:  Election 11 (owner: s3db5) lost: vote from s3db5 (Uptime)

Oct 23 07:18:50 [10452] s3db1       crmd:     info: do_election_count_vote:  Election 12 (owner: s3db5) lost: vote from s3db5 (Uptime)

Oct 23 08:10:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db5 402855596

Oct 23 08:10:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3db3 419632812

Oct 23 08:10:42 corosync [pcmk  ] info: pcmk_peer_update: lost: s3quorum1 1141053100

 

This is occurring on a systems running nearly 100% idle on a very quiet 10Gb/s network with no other activity, and no packet loss. Any idea what else could be causing this ? I notice on the ClusterLabs wiki, there appear to be tweaked values in the “Initial Configuration” page (http://clusterlabs.org/wiki/Initial_Configuration) which seem to deal with timeouts (see below). I have kept everything to the stock defaults, except I’m using udpu. Are these a likely candidate ? Is it recommended that these values should be applied in a production environment ?

 

Recommended values from Wiki :

            # How long before declaring a token lost (ms)

           token:          5000

            # How many token retransmits before forming a new configuration

           token_retransmits_before_loss_const: 20

            # How long to wait for join messages in the membership protocol (ms)

           join:           1000

            # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)

           consensus:      7500

            # Turn off the virtual synchrony filter

           vsftype:        none

            # Number of messages that may be sent by one processor on receipt of the token

           max_messages:   20

            # Disable encryption

           secauth:            off

            # How many threads to use for encryption/decryption

           threads:            0

 

My configuration :

 

compatibility: whitetank

totem {

    version: 2

    secauth: off

    interface {

        member {

            memberaddr: 172.22.3.26

        }  

        member {

            memberaddr: 172.22.3.25

        }  

 

…. And so on…

 

        ringnumber: 0

        # different on all hosts

        bindnetaddr: 172.22.3.26

        mcastport: 5405

    }

    transport: udpu

}

 

 

 

Many thanks,

 

-Mark


Mark Round
Senior Systems Administrator
NCC Group
Kings Court Kingston Road
Leatherhead, KT22 7SL

Telephone: +44 1372 383815
Mobile: +44 7790 770413
Fax:
Website: www.nccgroup.com
Email:  Mark.Round@xxxxxxxxxxxx


This email is sent for and on behalf of NCC Group. NCC Group is the trading name of NCC Group Performance Testing Limited (Registered in England CRN: 4069379). Registered Office: Manchester Technology Centre, Oxford Road, Manchester, M1 7EF. The ultimate holding company is NCC Group plc (Registered in England CRN: 4627044).

Confidentiality: This e-mail contains proprietary information, some or all of which may be confidential and/or legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author by replying to this e-mail and then delete the original. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on any information contained in this e-mail. You must not inform any other person other than NCC Group or the sender of its existence.

For more information about NCC Group please visit www.nccgroup.com

P Before you print think about the ENVIRONMENT


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux