Re: cannot add 3rd node to running cluster

Senol Erdogan <alkol6@xxxxxxxxx> · Fri, 22 Jan 2010 18:00:08 +0200

hi, 
maybe u have your cluster have Votes problem, 

when u use "cman_tool status" and if u see like a "Quorum: 2 Activity Blocked" message, then 
use "cman_tool expected -e 1". Again use "cman_tool status", u will see removed "Activity Blocked" on "Quorum:". 

This command pull down Qourum of cluster and cluster will runing (expected vote: n/2+1). This solution a tepmrolary solution for inadequate cluster qourum. (before check config_version="{ver.num}" , u know, must be same number in all nodes)

i hope server for your problem 

(yep, i know.. my englsh are ultra-professionel-imba :) )

2010/1/22 Terry <td3201@xxxxxxxxx>

On Fri, Jan 22, 2010 at 9:00 AM, King, Adam <adam.king@xxxxxxxxxxxxxxxx> wrote:

> I'm assuming you have read this? http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_2to3

>

>

>

>

> Adam King

> Systems Administrator

> adam.king@xxxxxxxxxxxxxxxx

>

>

> InTechnology plc

> Support 0845 120 7070

> Telephone 01423 850000

> Facsimile 01423 858866

> www.intechnology.com

>

>

> -----Original Message-----

>

> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Terry

> Sent: 22 January 2010 14:45

> To: linux clustering

> Subject: Re:  cannot add 3rd node to running cluster

>

> On Mon, Jan 4, 2010 at 1:34 PM, Abraham Alawi <a.alawi@xxxxxxxxxxxxxx> wrote:

>>

>> On 1/01/2010, at 5:13 AM, Terry wrote:

>>

>>> On Wed, Dec 30, 2009 at 10:13 AM, Terry <td3201@xxxxxxxxx> wrote:

>>>> On Tue, Dec 29, 2009 at 5:20 PM, Jason W. <jwellband@xxxxxxxxx> wrote:

>>>>> On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201@xxxxxxxxx> wrote:

>>>>>> Hello,

>>>>>>

>>>>>> I have a working 2 node cluster that I am trying to add a third node

>>>>>> to.   I am trying to use Red Hat's conga (luci) to add the node in but

>>>>>

>>>>> If you have two node cluster with two_node=1 in cluster.conf - such as

>>>>> two nodes with no quorum device to break a tie - you'll need to bring

>>>>> the cluster down, change two_node to 0 on both nodes (and rev the

>>>>> cluster version at the top of cluster.conf), bring the cluster up and

>>>>> then add the third node.

>>>>>

>>>>> For troubleshooting any cluster issue, take a look at syslog

>>>>> (/var/log/messages by default). It can help to watch it on a

>>>>> centralized syslog server that all of your nodes forward logs to.

>>>>>

>>>>> --

>>>>> HTH, YMMV, HANW :)

>>>>>

>>>>> Jason

>>>>>

>>>>> The path to enlightenment is /usr/bin/enlightenment.

>>>>

>>>> Thank you for the response.  /var/log/messages doesn't have any

>>>> errors.  It says cman started then says can't connect to cluster

>>>> infrastructure after a few seconds.  My cluster does not have the

>>>> two_node=1 config now.  Conga took that out for me.  That bit me last

>>>> night because I needed to put that back in.

>>>>

>>>

>>> CMAN still will not start and gives no debug information.  Anyone know

>>> why cman_tool -d join would not print any output at all?

>>> Troubleshooting this is kind of a nightmare.  I verified that two_node

>>> is not in play.

>>>

>>> --

>>> Linux-cluster mailing list

>>> Linux-cluster@xxxxxxxxxx

>>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>>

>> Try this line in your cluster.conf file:

>> <logging debug="on" logfile="/var/log/rhcs.log" to_file="yes"/>

>>

>> Also, if you are sure your cluster.conf is correct then copy it manually to all the nodes and add clean_start="1" to the fence_daemon line in cluster.conf and run 'service cman start' simultaneously on all the nodes (probably a good idea to do that from runlevel 1 but make sure you have the network up first)

>>

>> Cheers,

>>

>>  -- Abraham

>>

>> ''''''''''''''''''''''''''''''''''''''''''''''''''''''

>> Abraham Alawi

>>

>> Unix/Linux Systems Administrator

>> Science IT

>> University of Auckland

>> e: a.alawi@xxxxxxxxxxxxxx

>> p: +64-9-373 7599, ext#: 87572

>>

>> ''''''''''''''''''''''''''''''''''''''''''''''''''''''

>>

>>

>

> I am still battling this.  I stopped the cluster completely, modified

> the config and then started it, but that didn't work either.  Same

> issue.  I noticed clurgmgrd wasn't staying running so I then tried

> this:

>

> [root@omadvnfs01c ~]# clurgmgrd -d -f

> [7014] notice: Waiting for CMAN to start

>

> Then in another window I issued:

> [root@omadvnfs01c ~]# cman_tool join

>

>

> Then back in the other window below "[7014] notice: Waiting for CMAN

> to start", I got:

> failed acquiring lockspace: Transport endpoint is not connected

> Locks not working!

>

> Anyone know what could be going on?

>

> --

> Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

> This is an email from InTechnology plc, Central House, Beckwith Knowle, Harrogate, UK, HG3 1UG.

> Registered in England 3916586.

>

> The contents of this message may be privileged and confidential. If you have received this message in error, you may not use,

>

> disclose, copy or distribute its content in any way. Please notify the sender immediately. All messages are scanned for all viruses.

>

> --

I didn't but I performed those steps anyways.  As it sits, I have a

three node cluster with only two nodes in it.  Which is bad too but it

is what it is until I figure this out.  Here's my cluster.conf just

for completeness:

<cluster alias="omadvnfs01" config_version="53" name="omadvnfs01">

        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>

        <clusternodes>

                <clusternode name="omadvnfs01a.sec.jel.lc" nodeid="1" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="omadvnfs01a-drac"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="omadvnfs01b.sec.jel.lc" nodeid="2" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="omadvnfs01b-drac"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="omadvnfs01c.sec.jel.lc" nodeid="3" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="omadvnfs01c-drac"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman/>

        <fencedevices>

                <fencedevice agent="fence_drac" ipaddr="10.98.1.211"

login="root" name="omadvnfs01a-drac" passwd="foo"/>

                <fencedevice agent="fence_drac" ipaddr="10.98.1.212"

login="root" name="omadvnfs01b-drac" passwd="foo"/>

                <fencedevice agent="fence_drac" ipaddr="10.98.1.213"

login="root" name="omadvnfs01c-drac" passwd="foo"/>

        </fencedevices>

        <rm>

                <failoverdomains>

                        <failoverdomain name="fd_omadvnfs01a-nfs"

nofailback="1" ordered="1" restricted="0">

                                <failoverdomainnode

name="omadvnfs01a.sec.jel.lc" priority="1"/>

                        </failoverdomain>

                        <failoverdomain name="fd_omadvnfs01b-nfs"

nofailback="1" ordered="1" restricted="0">

                                <failoverdomainnode

name="omadvnfs01b.sec.jel.lc" priority="2"/>

                        </failoverdomain>

                        <failoverdomain name="fd_omadvnfs01c-nfs"

nofailback="1" ordered="1" restricted="0">

                                <failoverdomainnode

name="omadvnfs01c.sec.jel.lc" priority="1"/>

                        </failoverdomain>

                </failoverdomains>

I am not sure if I did a restart after I did the work though.  When it

says "shutdown cluster software" that is simply a 'service cman stop'

on redhat, right?  Want to make sure I don't need to kill any other

components before updating the configuration manually.  I appreciate

the help.  I am probably going to try it again this afternoon to

double check my work.

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster