Re: Working of a two-node cluster

Vasil Valchev <vasil.val@xxxxxxxxx> · Mon, 27 Apr 2015 10:58:55 +0300

Hi,

I would advise you to use quorum disk _only_ as a last resort - it's better to first get a solid understanding of the clustering solution before adding additional complexity.An amazingly thorough and well described tutorial you can find here: https://alteeve.ca/w/AN!Cluster_Tutorial_2

Especially useful are the first chapters - the theory.
What I suspect is happening in your case is that your cluster communication and fencing are over the same network, which is not fault tolerant.
So what happens if this network fails? Your 2 nodes can't see each other, so they send fence requests, but the fence devices are unreachable too, so those requests fail.
They are retried a few times I think, but if all fail, the fence agent returns failed and your cluster is stuck in "recovering" or stopped state.
Other times the network outage is shorter and the fence succeeds, resulting in both nodes going down - this is solved with the delay parameter.
The first issue is architectural one, it is the expected behavior of the cluster to stop (or "freeze") all resources if it can't guarantee the state of all members.

Read the article above it's really very useful.

Cheers!

On Mon, Apr 27, 2015 at 9:44 AM, Vijay Kakkar <vijaykakkars@xxxxxxxxx> wrote:
You should look for qdisk now.I hope this will be helpful.

On Mon, Apr 27, 2015 at 11:38 AM, Jatin Davey <jashokda@xxxxxxxxx> wrote:

    Yes , I did restart it.

    On 4/27/2015 11:31 AM, emmanuel segura
      wrote:

      did you restarted the cluster after added the delay parameter?

2015-04-27 7:49 GMT+02:00 Jatin Davey <jashokda@xxxxxxxxx>:

        Ok , i tried with delay but it has not helped. I guess i have to try using
quorum disk now.

Thanks
Jatin

On 4/24/2015 7:06 PM, Vijay Kakkar wrote:

You may need to delay the fencing ( delay=seconds ) or use quorum disk if
delaying the fencing doesn't help.

On Fri, Apr 24, 2015 at 6:23 PM, Jatin Davey <jashokda@xxxxxxxxx> wrote:

          Here is my cluster.conf file

************************
<?xml version="1.0"?>
<cluster config_version="4" name="****">
        <clusternodes>
                <clusternode name="node-103" nodeid="1">
                        <fence>
                                <method name="Method01">
                                        <device name="node-103"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node-105" nodeid="2">
                        <fence>
                                <method name="Method02">
                                        <device name="node-105"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="password"
ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-103" passwd="*****"
privlvl="ADMINISTRATOR"/>
                <fencedevice agent="fence_ipmilan" auth="password"
ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-105" passwd="******"
privlvl="ADMINISTRATOR"/>
        </fencedevices>
        <fence_daemon post_join_delay="120"/>
        <rm>
                <resources>
                        <netfs export="/test" force_unmount="1"
fstype="nfs" host="x.x.x.x" mountpoint="/test/test/test" name="test123"/>
                        <ip address="x.x.x.x" sleeptime="5"/>
                        <script file="/xxx/xxx/xxx/xxx/xx.sh"
name="xxxx"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="Failover01" nofailback="1"
ordered="1">
                                <failoverdomainnode name="node-103"
priority="1"/>
                                <failoverdomainnode name="node-105"
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <service domain="Failover01" name="Service01"
recovery="relocate">
                        <ip ref="x.x.x.x"/>
                        <netfs ref="test123"/>
                        <script ref="xxxx"/>
                </service>
        </rm>
</cluster>

On 4/24/2015 6:01 PM, emmanuel segura wrote:

            please share your cluster config, maybe in this way someone can help you.

2015-04-24 14:12 GMT+02:00 Jatin Davey <jashokda@xxxxxxxxx>:

              Hi

I am using a two node cluster using RHEL 6.5. I have a very fundamental
question.

For the two node cluster to work , Is it mandatory that both the nodes
are
"online" and communicating with each other ?

What i can see is that if there is communication failure between them
then
either both the nodes are fenced or the cluster gets into a "stopped"
state
(Seen from output of clustat command).

Apologies if my questions are naive. I am just starting to work with
RHEL
cluster add-on.

Thanks
Jatin

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

          --
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

        --
Cheers

Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}

Techgrills Systems Pvt. Ltd.
011-46521313 | +919999103657
http://www.techgrills.com
http://lnkd.in/bnj2VUU

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Cheers

Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}

Techgrills Systems Pvt. Ltd.
011-46521313 | +919999103657
http://www.techgrills.com
http://lnkd.in/bnj2VUU

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster