Hello, can you give us some hard facts on what versions of cluster-suite packages you are using in your environment and also the related logs? Have you read the corresponding parts of the cluster suites manual, man pages, FAQ and also searched the list-archives for similar problems already? If not -> do it, there are may good hints to find there. The nodes find each other and create a cluster very fast IF they can talk to each other. As no cluster networking is involved in fencing a remote node if the fencing node by itself is quorate this could be your problem. You should change to fence_manual and switch back to your real fencing devices after you have debuged your problem. Also get rid of the <fence_daemon ... /> tag in your cluster.conf as fenced does the right thing by default if the remaining configuration is right and now it is just hiding a part of the problem. Also the 5 minute break on cman start smells like a DNS-lookup problem or other network related problem to me. Here is a short check-list to be sure the nodes can talk to each other: Can the individual nodes ping each other? Can the individual nodes dns-lookup the other node-names (which you used in your cluster.conf)? (Try to add them to your etc/hosts file, that way you have a working cluster even if your dns-system is going on vacation.) Is your switch allowing multicast communication on all ports that are used for cluster communication? (This is a prerequisite for openais / corosync based cman which would be anything >= RHEL 5. Search the archives on this if you need more info...) Can you trace (eg. with wiresharks tshark) incoming cluster communication from remote nodes? (If you don't changed your fencing to fence_manual your listening system will get fenced before you can get any useful information out of it. Try with and without active firewall.) If all above could be answered with "yes" your cluster should form just fine. You could try to add a qdisk-device as tiebreaker after that and test it just to be sure you have a working last man standing setup... Hope that helps, Marc Am Donnerstag, den 16.07.2009, 23:41 -0700 schrieb Abed-nego G. Escobal, Jr.: > > Thanks for the tip. It helped by stopping each node kicking each other, as per the logs, but still I have a split brain status. > > On node01 > > # /usr/sbin/cman_tool nodes > Node Sts Inc Joined Name > 1 M 680 2009-07-17 00:30:42 node01.company.com > 2 X 0 node02.company.com > > # /usr/sbin/clustat > Cluster Status for GFSCluster @ Fri Jul 17 01:01:09 2009 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node01.company.com 1 Online, Local > node02.company.com 2 Offline > > > On node02 > > # /usr/sbin/cman_tool nodes > Node Sts Inc Joined Name > 1 X 0 node01.company.com > 2 M 676 2009-07-17 00:30:43 node02.company.com > > > # /usr/sbin/clustat > Cluster Status for GFSCluster @ Fri Jul 17 01:01:22 2009 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node01.company.com 1 Offline > node02.company.com 2 Online, Local > > > Another thing that I have noticed, > > 1. Start node01 with only itself as the member of the cluster > 2. Update cluster.conf to have node02 as an additional member > 3. Start node02 > > Yields both nodes being quorate (split brain) but only node02 tries to fence out node01. After some time, clustat will yield both of them being in the same cluster. Then I will be starting clvmd on node02 but will not be successful. After trying to start the clvmd service, clustat will yield split brain again. > > Are there some troubleshootings that I should be doing? > > > --- On Thu, 7/16/09, Aaron Benner <tfrumbacher@xxxxxxxxx> wrote: > > > From: Aaron Benner <tfrumbacher@xxxxxxxxx> > > Subject: Re: Starting two-node cluster with only one node > > To: "linux clustering" <linux-cluster@xxxxxxxxxx> > > Date: Thursday, 16 July, 2009, 10:04 PM > > Have you tried setting the > > "post_join_delay" value in the <fence_daemon ...> > > declaration to -1? > > > > <fence_daemon clean_start="0" post_fail_delay="0" > > post_join_delay="-1" /> > > > > This is a hint I picked up from the fenced man page section > > on avoiding boot time fencing. It tells fenced to wait > > until all of the nodes have joined the cluster before > > starting up. We use this on a couple of 2 node > > clusters (with qdisk) to allow them to start up without the > > first node to grab the quorum disk fencing the other node. > > > > --Aaron > > > > On Jul 16, 2009, at 12:16 AM, Abed-nego G. Escobal, Jr. > > wrote: > > > > > > > > > > > Tried it and now the two node cluster is running with > > only one node. My problem right now is how to force the > > second node to join the first node's cluster. Right now it > > is creating its own cluster and trying to fence the first > > node. I tried cman_tool leave on the second node but I got > > > > > > cman_tool: Error leaving cluster: Device or resource > > busy > > > > > > clvmd and gfs are not running on the second node. What > > is running on the second node is cman. When I did > > > > > > service cman start > > > > > > It took 5 approximately 5 minutes before I got the > > [ok] meassage. Am I missing something here? Not doing right? > > Should be doing something? > > > > > > > > > --- On Thu, 7/16/09, Abed-nego G. Escobal, Jr. <abednegoyulo@xxxxxxxxx> > > wrote: > > > > > >> From: Abed-nego G. Escobal, Jr. <abednegoyulo@xxxxxxxxx> > > >> Subject: Starting two-node cluster > > with only one node > > >> To: "linux clustering" <linux-cluster@xxxxxxxxxx> > > >> Date: Thursday, 16 July, 2009, 10:46 AM > > >> > > >> Using the config file below > > >> > > >> <?xml version="1.0"?> > > >> <cluster name="GFSCluster" > > config_version="5"> > > >> <cman expected_votes="1" two_node="1"/> > > >> <clusternodes><clusternode > > >> name="node01.company.com" votes="1" > > >> nodeid="1"><fence><method > > >> name="single"><device > > >> > > name="node01_ipmi"/></method></fence></clusternode><clusternode > > >> name="node02.company.com" votes="1" > > >> nodeid="2"><fence><method > > >> name="single"><device > > >> > > name="node02_ipmi"/></method></fence></clusternode></clusternodes> > > >> <fencedevices><fencedevice > > >> name="node01_ipmi" agent="fence_ipmilan" > > ipaddr="10.1.0.5" > > >> login="root" > > passwd="********"/><fencedevice > > >> name="node02_ipmi" agent="fence_ipmilan" > > ipaddr="10.1.0.7" > > >> login="root" > > passwd="********"/></fencedevices> > > >> <rm> > > >> <failoverdomains/> > > >> <resources/> > > >> </rm> > > >> </cluster> > > >> > > >> Is it possible to start the cluster by only > > bringing up one > > >> node? The reason why I asked is because currently > > bringing > > >> them up together produces a split brain, each of > > them member > > >> of the cluster GFSCluster of their own fencing > > each other. > > >> My plan is to bring up only one node to create a > > quorum then > > >> bring the other one up and manually join it to the > > existing > > >> cluster. > > >> > > >> I have already don the start_clean approach but it > > seems it > > >> does not work. > > >> -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster