Heartbeat time outs in rhel4 understanding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am trying to understand how these timers interact with each other.

 

In a RHEL4 cluster the heartbeat defaults are;

hello_timer:5

max_retries:5

deadnode_timeout:21

 

Meaning a heartbeat message is sent every 5 seconds, if it fails to receive a response it will start a deadnode counter @ 21 seconds. It will also try to send 5 more heartbeat requests. What is the interval of those retries? If none of those requests receive a response. 5 seconds pass.. there is 15 seconds left on the deadnode timer and we try upto 5 times to get a response…. This goes on until we hit the 4th iteration of the hellotimer it tries again upto 5 times and fails… we then hit the 21 second on the deadnode time.. fenced takes over and wham reboot.

 

Is my understanding of this correct????

 

Thanks for any help..

 

Michael

 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux