Rob, Heres a summary of what I have observed with this configuration. You may want to verify the accuracy of my observations on your own. Starting with RHEL3, the RHCS verified node membership via a network heartbeat rather than/in addition to a disk timestamp. The network heartbeat traffic moves over the same interface that is used to access the network resources. This means that there is no dedicated heartbeat interface like you would see in a microsoft cluster. The tiebreaker IP is used to prevent a split brain situation in a a cluster with an even number of nodes. Lets say you have 2 active cluster nodes... say nodeA and nodeB, and nodeA owns an NFS disk resource and IP. Then lets say nodeA fails to receive a heartbeat from nodeB over its primary interface. This could mean several things: nodeA's interface is down, nodeB's interface is down, or their shared switch is down. So if nodeA and nodeB stop communicating with eachother, they will both try to ping the tiebreaker IP, which is usually your default gateway IP. If nodeA gets a response from the tiebreaker IP, it will continue to own the resource. If it cant, it will assume its external interface is down and fence/reboot itself. The same holds true for nodeB. Unlike RHEL2.1 which used STONITH, RHEL3 cluster nodes reboot themselves. Therefor, even if nodeB can reach the tiebreaker and CANT reach nodeA, it will not get the cluster resource until nodeA releases it. This prevents the nodes from accessing the shared disk resource concommitantly. This configuration prevents split brain by ensuring the resource owner doesn't get killed accidentally by its peer. For those that remember, ping=ponging was a big problem with RHEL2.1 clusters. If they couldn't read their partners disk timestamp update in a timely manner -- due to IO latency or whatever -- they would reboot their partner node. On reboot, the rebooted node would STONITH the other node, etc. Anyway, I hope this answers your questions. It is fairly easy to test. Set up a 2 node cluster, then reboot the service owner. If the service starts on the other node, you should be configured correctly. Next disconnect the service owner from the network. The service owner should reboot itself with the watchdog or fail over the resource, depending on how its ocnfigured. Repeat this test with the non-service owner. (the resources should not move in this case.) then take turns disconnecting them from the shared storage. Cheers, jacob > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Veer, Rob ter > Sent: Thursday, August 25, 2005 4:45 AM > To: linux-cluster@xxxxxxxxxx > Subject: Tiebreaker IP > > Hello, > > I'm trying to get a deeper insight into the workings of the > tiebreaker with a standard RHCS cluster configuration. It's > not clear to me what the role of the tiebreaker within a 2 or > 4 node cluster exactly is. > > Is there any documentation on this subject? I'm particulary > interested in the flow of events when the tiebreaker is used. > I know the tiebreaker is used to prevent node isolation, but > how exactly? > > Regards, > Rob. > > -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster