Mark, what version of corosync you are using? Can you please take a look to corosync.log if there are also any messages about membership changes? If so and using new enough corosync, are there any messages about "corosync cannot be scheduled for ..."? Regards, Honza Mark Round napsal(a): > Hi all, > > I'm seeing lots of nodes being "lost" in my logs : > > Oct 22 16:47:46 corosync [pcmk ] info: pcmk_peer_update: lost: s3db3 419632812 > Oct 22 18:18:48 corosync [pcmk ] info: pcmk_peer_update: lost: s3db3 419632812 > Oct 22 18:47:14 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 > Oct 22 21:17:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 > Oct 23 04:27:48 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 > Oct 23 07:18:47 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 > Oct 23 07:18:50 [10452] s3db1 crmd: info: do_election_count_vote: Election 11 (owner: s3db5) lost: vote from s3db5 (Uptime) > Oct 23 07:18:50 [10452] s3db1 crmd: info: do_election_count_vote: Election 12 (owner: s3db5) lost: vote from s3db5 (Uptime) > Oct 23 08:10:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 > Oct 23 08:10:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3db3 419632812 > Oct 23 08:10:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3quorum1 1141053100 > > This is occurring on a systems running nearly 100% idle on a very quiet 10Gb/s network with no other activity, and no packet loss. Any idea what else could be causing this ? I notice on the ClusterLabs wiki, there appear to be tweaked values in the "Initial Configuration" page (http://clusterlabs.org/wiki/Initial_Configuration) which seem to deal with timeouts (see below). I have kept everything to the stock defaults, except I'm using udpu. Are these a likely candidate ? Is it recommended that these values should be applied in a production environment ? > > Recommended values from Wiki : > # How long before declaring a token lost (ms) > token: 5000 > # How many token retransmits before forming a new configuration > token_retransmits_before_loss_const: 20 > # How long to wait for join messages in the membership protocol (ms) > join: 1000 > # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) > consensus: 7500 > # Turn off the virtual synchrony filter > vsftype: none > # Number of messages that may be sent by one processor on receipt of the token > max_messages: 20 > # Disable encryption > secauth: off > # How many threads to use for encryption/decryption > threads: 0 > > My configuration : > > compatibility: whitetank > totem { > version: 2 > secauth: off > interface { > member { > memberaddr: 172.22.3.26 > } > member { > memberaddr: 172.22.3.25 > } > > .... And so on... > > ringnumber: 0 > # different on all hosts > bindnetaddr: 172.22.3.26 > mcastport: 5405 > } > transport: udpu > } > > > > Many thanks, > > -Mark > ________________________________ > Mark Round > Senior Systems Administrator > NCC Group > Kings Court Kingston Road > Leatherhead, KT22 7SL > > Telephone: +44 1372 383815 > Mobile: +44 7790 770413 > Fax: > Website: www.nccgroup.com<http://www.nccgroup.com> > Email: Mark.Round@xxxxxxxxxxxx<mailto:Mark.Round@xxxxxxxxxxxx> > [http://www.nccgroup.com/media/192418/nccgrouplogo.jpg] <http://www.nccgroup.com/> > ________________________________ > > This email is sent for and on behalf of NCC Group. NCC Group is the trading name of NCC Group Performance Testing Limited (Registered in England CRN: 4069379). Registered Office: Manchester Technology Centre, Oxford Road, Manchester, M1 7EF. The ultimate holding company is NCC Group plc (Registered in England CRN: 4627044). > > Confidentiality: This e-mail contains proprietary information, some or all of which may be confidential and/or legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author by replying to this e-mail and then delete the original. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on any information contained in this e-mail. You must not inform any other person other than NCC Group or the sender of its existence. > > For more information about NCC Group please visit www.nccgroup.com<http://www.nccgroup.com> > > P Before you print think about the ENVIRONMENT > > > For more information please visit <a href="http://www.mimecast.com">http://www.mimecast.com<br> > This email message has been delivered safely and archived online by Mimecast. > </a> > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss