Re: Starting two-node cluster with only one node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,


can you give us some hard facts on what versions of cluster-suite
packages you are using in your environment and also the related logs?

Have you read the corresponding parts of the cluster suites manual, man
pages, FAQ and also searched the list-archives for similar problems
already? If not -> do it, there are may good hints to find there.


The nodes find each other and create a cluster very fast IF they can
talk to each other. As no cluster networking is involved in fencing a
remote node if the fencing node by itself is quorate this could be your
problem.

You should change to fence_manual and switch back to your real fencing
devices after you have debuged your problem. Also get rid of the
<fence_daemon ... /> tag in your cluster.conf as fenced does the right
thing by default if the remaining configuration is right and now it is
just hiding a part of the problem.

Also the 5 minute break on cman start smells like a DNS-lookup problem
or other network related problem to me.

Here is a short check-list to be sure the nodes can talk to each other:

Can the individual nodes ping each other?

Can the individual nodes dns-lookup the other node-names (which you used
in your cluster.conf)? (Try to add them to your etc/hosts file, that way
you have a working cluster even if your dns-system is going on
vacation.)

Is your switch allowing multicast communication on all ports that are
used for cluster communication? (This is a prerequisite for openais /
corosync based cman which would be anything >= RHEL 5. Search the
archives on this if you need more info...)

Can you trace (eg. with wiresharks tshark) incoming cluster
communication from remote nodes? (If you don't changed your fencing to
fence_manual your listening system will get fenced before you can get
any useful information out of it. Try with and without active firewall.)

If all above could be answered with "yes" your cluster should form just
fine. You could try to add a qdisk-device as tiebreaker after that and
test it just to be sure you have a working last man standing setup...

Hope that helps,

Marc

Am Donnerstag, den 16.07.2009, 23:41 -0700 schrieb Abed-nego G. Escobal,
Jr.: 
> 
> Thanks for the tip. It helped by stopping each node kicking each other, as per the logs, but still I have a split brain status. 
> 
> On node01
> 
> # /usr/sbin/cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   M    680   2009-07-17 00:30:42  node01.company.com
>    2   X      0                        node02.company.com
> 
> # /usr/sbin/clustat 
> Cluster Status for GFSCluster @ Fri Jul 17 01:01:09 2009
> Member Status: Quorate
> 
>  Member Name                             ID   Status
>  ------ ----                             ---- ------
>  node01.company.com                         1 Online, Local
>  node02.company.com                         2 Offline
> 
> 
> On node02
> 
> # /usr/sbin/cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   X      0                        node01.company.com
>    2   M    676   2009-07-17 00:30:43  node02.company.com
>  
> 
> # /usr/sbin/clustat
> Cluster Status for GFSCluster @ Fri Jul 17 01:01:22 2009
> Member Status: Quorate
> 
>  Member Name                             ID   Status
>  ------ ----                             ---- ------
>  node01.company.com                         1 Offline
>  node02.company.com                         2 Online, Local
> 
> 
> Another thing that I have noticed,
> 
> 1. Start node01 with only itself as the member of the cluster
> 2. Update cluster.conf to have node02 as an additional member
> 3. Start node02
> 
> Yields both nodes being quorate (split brain) but only node02 tries to fence out node01. After some time, clustat will yield both of them being in the same cluster. Then I will be starting clvmd on node02 but will not be successful. After trying to start the clvmd service, clustat will yield split brain again. 
> 
> Are there some troubleshootings that I should be doing?
> 
> 
> --- On Thu, 7/16/09, Aaron Benner <tfrumbacher@xxxxxxxxx> wrote:
> 
> > From: Aaron Benner <tfrumbacher@xxxxxxxxx>
> > Subject: Re:  Starting two-node cluster with only one node
> > To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> > Date: Thursday, 16 July, 2009, 10:04 PM
> > Have you tried setting the
> > "post_join_delay" value in the <fence_daemon ...>
> > declaration to -1?
> > 
> > <fence_daemon clean_start="0" post_fail_delay="0"
> > post_join_delay="-1" />
> > 
> > This is a hint I picked up from the fenced man page section
> > on avoiding boot time fencing.  It tells fenced to wait
> > until all of the nodes have joined the cluster before
> > starting up.  We use this on a couple of 2 node
> > clusters (with qdisk) to allow them to start up without the
> > first node to grab the quorum disk fencing the other node.
> > 
> > --Aaron
> > 
> > On Jul 16, 2009, at 12:16 AM, Abed-nego G. Escobal, Jr.
> > wrote:
> > 
> > > 
> > > 
> > > Tried it and now the two node cluster is running with
> > only one node. My problem right now is how to force the
> > second node to join the first node's cluster. Right now it
> > is creating its own cluster and trying to fence the first
> > node. I tried cman_tool leave on the second node but I got
> > > 
> > > cman_tool: Error leaving cluster: Device or resource
> > busy
> > > 
> > > clvmd and gfs are not running on the second node. What
> > is running on the second node is cman. When I did
> > > 
> > > service cman start
> > > 
> > > It took 5 approximately 5 minutes before I got the
> > [ok] meassage. Am I missing something here? Not doing right?
> > Should be doing something?
> > > 
> > > 
> > > --- On Thu, 7/16/09, Abed-nego G. Escobal, Jr. <abednegoyulo@xxxxxxxxx>
> > wrote:
> > > 
> > >> From: Abed-nego G. Escobal, Jr. <abednegoyulo@xxxxxxxxx>
> > >> Subject:  Starting two-node cluster
> > with only one node
> > >> To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> > >> Date: Thursday, 16 July, 2009, 10:46 AM
> > >> 
> > >> Using the config file below
> > >> 
> > >> <?xml version="1.0"?>
> > >> <cluster name="GFSCluster"
> > config_version="5">
> > >> <cman expected_votes="1" two_node="1"/>
> > >>   <clusternodes><clusternode
> > >> name="node01.company.com" votes="1"
> > >> nodeid="1"><fence><method
> > >> name="single"><device
> > >>
> > name="node01_ipmi"/></method></fence></clusternode><clusternode
> > >> name="node02.company.com" votes="1"
> > >> nodeid="2"><fence><method
> > >> name="single"><device
> > >>
> > name="node02_ipmi"/></method></fence></clusternode></clusternodes>
> > >>   <fencedevices><fencedevice
> > >> name="node01_ipmi" agent="fence_ipmilan"
> > ipaddr="10.1.0.5"
> > >> login="root"
> > passwd="********"/><fencedevice
> > >> name="node02_ipmi" agent="fence_ipmilan"
> > ipaddr="10.1.0.7"
> > >> login="root"
> > passwd="********"/></fencedevices>
> > >>   <rm>
> > >>     <failoverdomains/>
> > >>     <resources/>
> > >>   </rm>
> > >> </cluster>
> > >> 
> > >> Is it possible to start the cluster by only
> > bringing up one
> > >> node? The reason why I asked is because currently
> > bringing
> > >> them up together produces a split brain, each of
> > them member
> > >> of the cluster GFSCluster of their own fencing
> > each other.
> > >> My plan is to bring up only one node to create a
> > quorum then
> > >> bring the other one up and manually join it to the
> > existing
> > >> cluster.
> > >> 
> > >> I have already don the start_clean approach but it
> > seems it
> > >> does not work.
> > >> 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux