Ok, so let me ask this. I did a tcpdump between nodes. Is the heartbeat the udp pack I see? I also see an xml doc. Like node1 keeps uptime and other cluster info for itself and node2. node2 keeps uptime and cluster onfo for nodes 1 and 3. Node 3 does the same for 2 and 4 and so on. I assume is a node dies then they next closest node starts watching the uptime for that node until the failed node rejoins. Thanks again -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Chrissie Caulfield Sent: Wednesday, May 06, 2009 4:06 AM To: linux clustering Subject: Re: Heartbeat time outs in rhel4 understanding Elias, Michael wrote: > I am trying to understand how these timers interact with each other. > > > > In a RHEL4 cluster the heartbeat defaults are; > > hello_timer:5 > > max_retries:5 > > deadnode_timeout:21 > > > > Meaning a heartbeat message is sent every 5 seconds, if it fails to > receive a response it will start a deadnode counter @ 21 seconds. It > will also try to send 5 more heartbeat requests. What is the interval of > those retries? If none of those requests receive a response. 5 seconds > pass.. there is 15 seconds left on the deadnode timer and we try upto 5 > times to get a response.... This goes on until we hit the 4^th iteration > of the hellotimer it tries again upto 5 times and fails... we then hit the > 21 second on the deadnode time.. fenced takes over and wham reboot. > > > > Is my understanding of this correct???? > No, I'm afraid it isn't :-) max_retries has nothing to do with the heartbeat. It is to do with cluster messages, such as service join requests, clvmd messages or the messages used in the membership protocol. So the heartbeat system is just a 5 second heartbeat and after 21 seconds the node will be evicted from the cluster and (usually) fenced. The same happens for data messages if max_retries is exceeded. The retry period here starts at 1 second and increases each time to avoid filling the ethernet buffers. I hope this helps, Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster