RH documentation and RH Engineering do not agree

Rick Rodgers <rodgersr@xxxxxxxxx> · Sun, 1 Oct 2006 22:12:28 -0700 (PDT)

I am using a two node cluster without a tiebreaker and find that the 
documentation that RH provides and some of the technical info provided
by RH engineering folks do not agree with each other. Redhaat docs say that if 
all netowrking fails that the nodes will not failover because they still have the
disk to act as heartbeat. Also the docs say that the services will continue. Yet, I tested
theis and it does not aahhpen this way. The system will get stonithed. So what is the the
real story here? Below are the docs I am talking about.

Below are two excerpts from RH documentation that says the following about loss of network connections
in a two node cluster:

----------------------------------------------------------------------------
    Total Network
 Connection Failure

    A total network connection failure occurs when all the heartbeat network connections between the systems fail.     This can be caused by one of the following:

             All the heartbeat network cables are disconnected from a system.
          All the serial connections and network interfaces used for heartbeat communication fail.

    If a total network connection failure occurs, both systems detect the problem, but they also detect that the SCSI     disk connections are still active. Therefore, services remain running on the systems and are not interrupted.

---------------------------------------------------------------------------------
From a RH FAQ
 list
----------------------------------------------------------------------------------

    E.4. Common Behaviors: Two Member Cluster with Disk-basedTie-breaker

    Loss of network connectivity to other member, shared media still accessible
    Common Causes: Network connectivity lost.
    Test Case: Disconnect all network cables from a member.
    Expected Behavior: No fail-over unless disk updates are also lost. Services will not be able to be
    relocated in most cases, which is due to the fact that the lock server requires network connectivity.
----------------------------------------------------------------------

However this does not seem to be the case. The systems stop the service or get STONITHed.

Below is some info form this message board with a reply from RH engineering that seems to 
confirm that the nodes will get
 STONITHed.  THis is followed by more RH engineering that 
conform to the RH docs. ????/

  RE:  Tiebreaker IP

------------------------------------------------------------------------

    * /From/: <JACOB_LIBERMAN Dell com>
    * /To/: <linux-cluster redhat com>
    * /Subject/: RE:  Tiebreaker IP
    * /Date/: Fri, 26 Aug 2005 13:24:39 -0500

------------------------------------------------------------------------

    Rob,

    Heres a summary of what I have observed with this configuration. You may
    want to verify the accuracy of my observations on your own. 

    Starting with RHEL3, the RHCS verified node membership via a network
    heartbeat rather than/in addition to a disk timestamp. The
 network
    heartbeat traffic moves over the same interface that is used to access
    the network resources. This means that there is no dedicated heartbeat
    interface like you would see in a microsoft cluster.

    The tiebreaker IP is used to prevent a split brain situation in a a
    cluster with an even number of nodes. Lets say you have 2 active cluster
    nodes... say nodeA and nodeB, and nodeA owns an NFS disk resource and
    IP. Then lets say nodeA fails to receive a heartbeat from nodeB over its
    primary interface. This could mean several things: nodeA's interface is
    down, nodeB's interface is down, or their shared switch is down. So if
    nodeA and nodeB stop communicating with eachother, they will both try to
    ping the tiebreaker IP, which is
 usually your default gateway IP. If
    nodeA gets a response from the tiebreaker IP, it will continue to own
    the resource. If it cant, it will assume its external interface is down
    and fence/reboot itself. The same holds true for nodeB. Unlike RHEL2.1
    which used STONITH, RHEL3 cluster nodes reboot themselves. Therefor,
    even if nodeB can reach the tiebreaker and CANT reach nodeA, it will not
    get the cluster resource until nodeA releases it. This prevents the
    nodes from accessing the shared disk resource concommitantly.

    This configuration prevents split brain by ensuring the resource owner
    doesn't get killed accidentally by its peer. For those that remember,
    ping=ponging was a big problem with RHEL2.1 clusters. If they couldn't
    read
 their partners disk timestamp update in a timely manner -- due to
    IO latency or whatever -- they would reboot their partner node. On
    reboot, the rebooted node would STONITH the other node, etc.

    Anyway, I hope this answers your questions. It is fairly easy to test.
    Set up a 2 node cluster, then reboot the service owner. If the service
    starts on the other node, you should be configured correctly. Next
    disconnect the service owner from the network. The service owner should
    reboot itself with the watchdog or fail over the resource, depending on
    how its ocnfigured. Repeat this test with the non-service owner. (the
    resources should not move in this case.) then take turns disconnecting
    them from the shared storage. 

    Cheers,
 jacob

-----------------------------------------------------------------------------
     RE:  Tiebreaker IP

------------------------------------------------------------------------

    * /From/: Lon Hohberger <lhh redhat com>
    * /To/: linux clustering <linux-cluster redhat com>
    * /Subject/: RE:  Tiebreaker IP
    * /Date/: Mon, 29 Aug 2005 15:19:40 -0400

------------------------------------------------------------------------

    On Fri, 2005-08-26 at 13:24 -0500, JACOB_LIBERMAN Dell com wrote:

    > If it cant, it will assume its external interface is down
    > and fence/reboot itself. The same holds true for nodeB. Unlike RHEL2.1
    > which used STONITH, RHEL3 cluster nodes reboot themselves.

    Both use STONITH.  RHEL3 cluster nodes are more paranoid about running
    without STONITH.

    If STONITH is configured on a RHEL3 cluster, the node will instead wait
    to be shot -- or for a new quorum to form -- if it loses network
    connectivity.

    > Anyway, I hope this answers your questions. It is fairly easy to test.
    > Set up a 2 node cluster, then reboot the service owner. If the service
    > starts on the other node, you should be configured correctly. Next
    > disconnect the service owner from the network. The service owner should
    > reboot itself with the watchdog or fail over the resource, depending on
    > how its ocnfigured.

    It should reboot itself because it loses quorum,
 really.  Basically,
    without STONITH, a node thinks like this on RHEL3:

    "I was quorate and now I'm not, and no one can cut me off from shared
    storage... Uh, oh, REBOOT!"

-- Lon

------------------------------------------------------------------------

more
-----------------------------------------------------------------------------  

    >The disk tiebreaker works in a similar way, except that it lets the

    >cluster limp in along in a safe, semi-split-brain (split brain) in a

    >network outage.  What I mean is that because there's state information

    >written to/read from the shared raw partitions, the nodes can actually

    >tell via other means whether or not the other node is "alive" or not as

    >opposed
 to relying solely on the network traffic.

    >Both nodes update state information on the shared partitions.  When one

    >node detects that the other node has not updated its information for a

    >period of time, that node is "down" according to the disk subsystem.If

    >this coincides with a "down" status from the membership daemon, the node

    >is fenced and services are failed over.  If the node never goes down

    >(and keeps updating its information on the shared partitions), then the

    >node is never fenced and services never fail over.

    -- Lon

14. What is a quorum disk/partition and what does it do for you?

      A quorum disk or partition is a section of a disk that's set up

 for use with components of the cluster project. It has a couple of
      purposes. Again, I'll explain with an example.

      Suppose you have nodes A and B, and node A fails to get several of
      cluster manager's "heartbeat" packets from node B. Node A doesn't
      know why it hasn't received the packets, but there are several
      possibilities: either node B has failed, the network switch or hub
      has failed, node A's network adapter has failed, or maybe just
      because node B was just too busy to send the packet. That can
      happen if your cluster is extremely large, your systems are
      extremely busy or your network is flakey.

      Node A doesn't know which is the
 case, and it doesn't know whether
      the problem lies within itself or with node B. This is especially
      problematic in a two-node cluster because both nodes, out of touch
      with one another, can try to fence the other.

      So before fencing a node, it would be nice to have another way to
      check if the other node is really alive, even though we can't seem
      to contact it. A quorum disk gives you the ability to do just
      that. Before fencing a node that's out of touch, the cluster
      software can check whether the node is still alive based on
      whether it has written data to the quorum partition.

      In the case of two-node systems, the quorum disk also
 acts as a
      tie-breaker. If a node has access to the quorum disk and the
      network, that counts as two votes.

      A node that has lost contact with the network or the quorum disk
      has lost a vote, and therefore may safely be fenced.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster