Hi all,
I’m seeing lots of nodes being “lost” in my logs :
Oct 22 16:47:46 corosync [pcmk ] info: pcmk_peer_update: lost: s3db3 419632812 Oct 22 18:18:48 corosync [pcmk ] info: pcmk_peer_update: lost: s3db3 419632812 Oct 22 18:47:14 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 Oct 22 21:17:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 Oct 23 04:27:48 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 Oct 23 07:18:47 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 Oct 23 07:18:50 [10452] s3db1 crmd: info: do_election_count_vote: Election 11 (owner: s3db5) lost: vote from s3db5 (Uptime) Oct 23 07:18:50 [10452] s3db1 crmd: info: do_election_count_vote: Election 12 (owner: s3db5) lost: vote from s3db5 (Uptime) Oct 23 08:10:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3db5 402855596 Oct 23 08:10:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3db3 419632812 Oct 23 08:10:42 corosync [pcmk ] info: pcmk_peer_update: lost: s3quorum1 1141053100
This is occurring on a systems running nearly 100% idle on a very quiet 10Gb/s network with no other activity, and no packet loss. Any idea what else could be causing this ? I notice on the ClusterLabs wiki, there appear to be tweaked values in the “Initial Configuration” page (http://clusterlabs.org/wiki/Initial_Configuration) which seem to deal with timeouts (see below). I have kept everything to the stock defaults, except I’m using udpu. Are these a likely candidate ? Is it recommended that these values should be applied in a production environment ?
Recommended values from Wiki : # How long before declaring a token lost (ms) token: 5000
# How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 20
# How long to wait for join messages in the membership protocol (ms) join: 1000
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 7500
# Turn off the virtual synchrony filter vsftype: none
# Number of messages that may be sent by one processor on receipt of the token max_messages: 20
# Disable encryption secauth: off
# How many threads to use for encryption/decryption threads: 0
My configuration :
compatibility: whitetank totem { version: 2 secauth: off interface { member { memberaddr: 172.22.3.26 } member { memberaddr: 172.22.3.25 }
…. And so on…
ringnumber: 0 # different on all hosts bindnetaddr: 172.22.3.26 mcastport: 5405 } transport: udpu }
Many thanks,
-Mark
|
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss