That was very helpful, thank you!
On 8/5/2013 6:05 AM, Jan Friesse wrote:
Rusell,
Russell Jones napsal(a):
Hi all,
I am trying to understand how the corosync token, token_retansmit, and
token_retransmit_before_loss_const variables all tie in together.
Definitively look to corosync.conf man page.
Summary:
token: How long to wait until receive token. When not received, start
forming new cluster
token_retransmit is automatically computed from
token_retransmits_before_loss_const: It's used for making membership
more stable. If token is not received in given time, previous token is
retransmitted. So If token was lost on the line (and because of UDP it's
possible), it may be retransmitted. This value is SMALLER then token
(usually 1/4 of token), so it means, 4 tokens are sent before node tries
to recreate membership.
Generally, don't modify token_retransmit and
token_retransmits_before_loss_const. Just modify token if you have big
latency. Some setups (very rarely) also need to modify send_join and join.
I have a standard RHCS v3 cluster set up and running. The token timeout
is set to 10000. When testing it seems to detect failed members pretty
consistently within 10 seconds. What I am not understanding is *when* a
node is declared dead, and a fence call is actually made. The man pages
show that the cluster is reconfigured when the "token" time is reached,
and also when token_retransmits_before_loss_const is reached. This is
confusing :-)
As I said, formula is token/token_retransmits_before_loss_const =
token_retransmit. So just set token if you need something special. If
you will set token_retransmit incorrectly, it may take precedence or
token may take precedence (whatever is smaller).
Which one is it that will reform the cluster? Both? When does one taken
precedence over the other?
Both. Smaller one.
Thanks!
Regards,
Honza
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster