I am using a two node cluster without a tiebreaker and find that the
documentation that RH provides and some of the technical info provided
by RH engineering folks do not agree with each other. Redhaat docs say that if
all netowrking fails that the nodes will not failover because they still have the
disk to act as heartbeat. Also the docs say that the services will continue. Yet, I tested
theis and it does not aahhpen this way. The system will get stonithed. So what is the the
real story here? Below are the docs I am talking about.
Below are two excerpts from RH documentation that says the following about loss of network connections
in a two node cluster:
----------------------------------------------------------------------------
Total Network Connection Failure
A total network connection failure occurs when all the heartbeat network connections between the systems fail. This can be caused by one of the following:
All the heartbeat network cables are disconnected from a system.
All the serial connections and network interfaces used for heartbeat communication fail.
If a total network connection failure occurs, both systems detect the problem, but they also detect that the SCSI disk connections are still active. Therefore, services remain running on the systems and are not interrupted.
---------------------------------------------------------------------------------
From a RH FAQ list
----------------------------------------------------------------------------------
E.4. Common Behaviors: Two Member Cluster with Disk-basedTie-breaker
Loss of network connectivity to other member, shared media still accessible
Common Causes: Network connectivity lost.
Test Case: Disconnect all network cables from a member.
Expected Behavior: No fail-over unless disk updates are also lost. Services will not be able to be
relocated in most cases, which is due to the fact that the lock server requires network connectivity.
----------------------------------------------------------------------
However this does not seem to be the case. The systems stop the service or get STONITHed.
Below is some info form this message board with a reply from RH engineering that seems to
confirm that the nodes will get STONITHed. THis is followed by more RH engineering that
conform to the RH docs. ????/
RE: Tiebreaker IP
------------------------------------------------------------------------
* /From/: <JACOB_LIBERMAN Dell com>
* /To/: <linux-cluster redhat com>
* /Subject/: RE: Tiebreaker IP
* /Date/: Fri, 26 Aug 2005 13:24:39 -0500
------------------------------------------------------------------------
Rob,
Heres a summary of what I have observed with this configuration. You may
want to verify the accuracy of my observations on your own.
Starting with RHEL3, the RHCS verified node membership via a network
heartbeat rather than/in addition to a disk timestamp. The network
heartbeat traffic moves over the same interface that is used to access
the network resources. This means that there is no dedicated heartbeat
interface like you would see in a microsoft cluster.
The tiebreaker IP is used to prevent a split brain situation in a a
cluster with an even number of nodes. Lets say you have 2 active cluster
nodes... say nodeA and nodeB, and nodeA owns an NFS disk resource and
IP. Then lets say nodeA fails to receive a heartbeat from nodeB over its
primary interface. This could mean several things: nodeA's interface is
down, nodeB's interface is down, or their shared switch is down. So if
nodeA and nodeB stop communicating with eachother, they will both try to
ping the tiebreaker IP, which is usually your default gateway IP. If
nodeA gets a response from the tiebreaker IP, it will continue to own
the resource. If it cant, it will assume its external interface is down
and fence/reboot itself. The same holds true for nodeB. Unlike RHEL2.1
which used STONITH, RHEL3 cluster nodes reboot themselves. Therefor,
even if nodeB can reach the tiebreaker and CANT reach nodeA, it will not
get the cluster resource until nodeA releases it. This prevents the
nodes from accessing the shared disk resource concommitantly.
This configuration prevents split brain by ensuring the resource owner
doesn't get killed accidentally by its peer. For those that remember,
ping=ponging was a big problem with RHEL2.1 clusters. If they couldn't
read their partners disk timestamp update in a timely manner -- due to
IO latency or whatever -- they would reboot their partner node. On
reboot, the rebooted node would STONITH the other node, etc.
Anyway, I hope this answers your questions. It is fairly easy to test.
Set up a 2 node cluster, then reboot the service owner. If the service
starts on the other node, you should be configured correctly. Next
disconnect the service owner from the network. The service owner should
reboot itself with the watchdog or fail over the resource, depending on
how its ocnfigured. Repeat this test with the non-service owner. (the
resources should not move in this case.) then take turns disconnecting
them from the shared storage.
Cheers, jacob
-----------------------------------------------------------------------------
RE: Tiebreaker IP
------------------------------------------------------------------------
* /From/: Lon Hohberger <lhh redhat com>
* /To/: linux clustering <linux-cluster redhat com>
* /Subject/: RE: Tiebreaker IP
* /Date/: Mon, 29 Aug 2005 15:19:40 -0400
------------------------------------------------------------------------
On Fri, 2005-08-26 at 13:24 -0500, JACOB_LIBERMAN Dell com wrote:
> If it cant, it will assume its external interface is down
> and fence/reboot itself. The same holds true for nodeB. Unlike RHEL2.1
> which used STONITH, RHEL3 cluster nodes reboot themselves.
Both use STONITH. RHEL3 cluster nodes are more paranoid about running
without STONITH.
If STONITH is configured on a RHEL3 cluster, the node will instead wait
to be shot -- or for a new quorum to form -- if it loses network
connectivity.
> Anyway, I hope this answers your questions. It is fairly easy to test.
> Set up a 2 node cluster, then reboot the service owner. If the service
> starts on the other node, you should be configured correctly. Next
> disconnect the service owner from the network. The service owner should
> reboot itself with the watchdog or fail over the resource, depending on
> how its ocnfigured.
It should reboot itself because it loses quorum, really. Basically,
without STONITH, a node thinks like this on RHEL3:
"I was quorate and now I'm not, and no one can cut me off from shared
storage... Uh, oh, REBOOT!"
-- Lon
------------------------------------------------------------------------
more
-----------------------------------------------------------------------------
>The disk tiebreaker works in a similar way, except that it lets the
>cluster limp in along in a safe, semi-split-brain (split brain) in a
>network outage. What I mean is that because there's state information
>written to/read from the shared raw partitions, the nodes can actually
>tell via other means whether or not the other node is "alive" or not as
>opposed to relying solely on the network traffic.
>Both nodes update state information on the shared partitions. When one
>node detects that the other node has not updated its information for a
>period of time, that node is "down" according to the disk subsystem.If
>this coincides with a "down" status from the membership daemon, the node
>is fenced and services are failed over. If the node never goes down
>(and keeps updating its information on the shared partitions), then the
>node is never fenced and services never fail over.
-- Lon
14. What is a quorum disk/partition and what does it do for you?
A quorum disk or partition is a section of a disk that's set up
for use with components of the cluster project. It has a couple of
purposes. Again, I'll explain with an example.
Suppose you have nodes A and B, and node A fails to get several of
cluster manager's "heartbeat" packets from node B. Node A doesn't
know why it hasn't received the packets, but there are several
possibilities: either node B has failed, the network switch or hub
has failed, node A's network adapter has failed, or maybe just
because node B was just too busy to send the packet. That can
happen if your cluster is extremely large, your systems are
extremely busy or your network is flakey.
Node A doesn't know which is the case, and it doesn't know whether
the problem lies within itself or with node B. This is especially
problematic in a two-node cluster because both nodes, out of touch
with one another, can try to fence the other.
So before fencing a node, it would be nice to have another way to
check if the other node is really alive, even though we can't seem
to contact it. A quorum disk gives you the ability to do just
that. Before fencing a node that's out of touch, the cluster
software can check whether the node is still alive based on
whether it has written data to the quorum partition.
In the case of two-node systems, the quorum disk also acts as a
tie-breaker. If a node has access to the quorum disk and the
network, that counts as two votes.
A node that has lost contact with the network or the quorum disk
has lost a vote, and therefore may safely be fenced.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster