Re: Cman hang

Juan Ramon Martin Blanco <robejrm@xxxxxxxxx> · Thu, 13 Aug 2009 18:10:58 +0200

On Thu, Aug 13, 2009 at 5:45 PM, NTOUGHE GUY-SERGE <ntoughe@xxxxxxxxxxx> wrote:

Thank's Junra,
I did it , and i tried to restart the cman service  without more success, thise are the messages i got:

my new cluster.conf:
<?xml version="1.0"?>
<cluster alias="arevclust" config_version="1" name="arevclust">

        <clusternodes>
        <cman expected_votes="1" two_node="1">
        </cman>
                <clusternode name="gs21spli003.occ.lan" nodeid="1" votes="1">

                </clusternode>
                <clusternode name="gs21spli004.occ.lan" nodeid="2" votes="1">
You should put the node names in /etc/hosts to be resolved to the IP addresses configured on the network interfaces you want to be used for cluster comms.

Greetings,
Juanra 

                </clusternode>
        </clusternodes>

</cluster>
~                                                                      
j

cman not started: Multicast and node address families differ. /usr/sbin/cman_tool: aisexec daemon didn't start

when i mounted the gfs FS i got this:
# mount -t gfs2 /dev/mapper/VolGroup01-LogVol01  /appli/prod
/sbin/mount.gfs2: can't connect to gfs_controld: Connection refused
/sbin/mount.gfs2: can't connect to gfs_controld: Connection refused

/sbin/mount.gfs2: can't connect to gfs_controld: Connection refused

I 'm not sure but i have a doubt on the lockTableName  during  the FS GFS creation:
mkfs.gfs2 -p lock_dlm -t arevclust:appli/prod -j 1 /dev/mapper/VolGroup00-LogVol01,  the name of the cluster is: arevclust

The gfs File system is made by /dev/mapper/volGroup00-LogVol01 and the mounting point is /appli/prod, so if i have to precise my gfs file i put : appli/prod is it correct?
regards

ntoughe@xxxxxxxxxxx

> From: linux-cluster-request@xxxxxxxxxx
> Subject: Linux-cluster Digest, Vol 64, Issue 18
> To: linux-cluster@xxxxxxxxxx

> Date: Thu, 13 Aug 2009 11:24:21 -0400
> 
> Send Linux-cluster mailing list submissions to
> 	linux-cluster@xxxxxxxxxx
> 
> To subscribe or unsubscribe via the World Wide Web, visit

> 	https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to

> 	linux-cluster-request@xxxxxxxxxx
> 
> You can reach the person managing the list at
> 	linux-cluster-owner@xxxxxxxxxx

> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Qdisk question (brem belguebli)

>    2. Re: Cman hang (Juan Ramon Martin Blanco)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 13 Aug 2009 17:23:16 +0200

> From: brem belguebli <brem.belguebli@xxxxxxxxx>
> Subject: Re: [Linux-cluster] Qdisk question
> To: linux clustering <linux-cluster@xxxxxxxxxx>

> Message-ID:
> 	<29ae894c0908130823i65667021vdc840ae1f0ded134@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="iso-8859-1"

> 
> Hi Lon and Thanks for this reply.
> 
> In fact, thinking about it, my test wasn't very much representative of what
> I was expecting to do.
> 
> I blocked the qdisk communications to only one node which, after reading

> your reply, kind of confirmed me that I did the wong test. I'm going to re
> run it by blocking all the nodes to the qdisk.
> 
> I'll also try your ping tie-breaker.
> 
> Brem

> 
> 
> 2009/8/13, Lon Hohberger <lhh@xxxxxxxxxx>:
> >
> > On Thu, 2009-08-13 at 00:45 +0200, brem belguebli wrote:
> >

> > > My understanding of qdisk is that it is used as a tie-breaker, but it
> > > looks like it is more a heatbeat vector than a simple tie-breaker.
> >
> > Right, it's a secondary membership algorithm.

> >
> >
> > > Until here, no real problem indeed, if the site gets apart from the
> > > other prod site and also from the third site (hosting the iscsi target
> > > qdisk) the 2 nodes from the failing site get evicted from the cluster.

> > >
> > >
> > > But, what if my third site gets isolated while the 2 prod ones are
> > > fine ?
> >
> > Qdisk votes will not be presented to CMAN any more, but the two sites

> > should remain online if they still have a "majority" of votes.
> >
> >
> > > The real  question is what happens in case all the nodes loose access
> > > to the qdisk while they're still able to see each others ?

> >
> > Qdisk is just a vote like other voting mechanisms.  If all nodes lose
> > access at the same time, it should behave like a node death.  However,
> > the default action if _one_ node loses access is to kill that node (even

> > if CMAN still sees it).
> >
> >
> > > The 4 nodes have each 1 vote and the qdisk 1 vote. The expected quorum
> > > is 3.
> >
> >
> > > If I loose the qdisk, the number of votes falls to 4, the cluster is

> > > quorate (4>3) but it looks like everything goes bad, each node
> > > deactivate itself as it can't write its alive status (--> heartbeat
> > > vector) to the qdisk even if the network heartbeating is working

> > > fine.
> >
> > What happens specifically?  Most of the actions qdiskd performs are
> > configurable.  For example, if the nodes are rebooting, you can turn
> > that behavior off.

> >
> >
> >
> > I wrote a simple 'ping' tiebreaker based the behaviors in RHEL3.  It
> > functions in many ways in the same manner as qdiskd with respect to vote
> > advertisement to CMAN, but without needing a disk - maybe you would find

> > it useful?
> >
> > http://people.redhat.com/lhh/qnet.tar.gz
> >
> > -- Lon
> >
> > --

> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster

> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://www.redhat.com/archives/linux-cluster/attachments/20090813/5482c8bf/attachment.html

> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 13 Aug 2009 17:23:54 +0200
> From: Juan Ramon Martin Blanco <robejrm@xxxxxxxxx>

> Subject: Re:  Cman hang
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Message-ID:
> 	<8a5668960908130823u11b46ad9pd1da5af3614ce3d3@xxxxxxxxxxxxxx>

> Content-Type: text/plain; charset="iso-8859-1"
> 
> On Thu, Aug 13, 2009 at 5:13 PM, NTOUGHE GUY-SERGE <ntoughe@xxxxxxxxxxx>wrote:

> 
> >  Hi, this is my cluster.conf
> > <?xml version="1.0"?>
> > <cluster alias="arevclust" config_version="21" name="arevclust">
> >   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>

> >   <clusternodes>
> >   <clusternode name="host1" nodeid="1" votes="1">
> >   <fence>
> >   <method name="2">
> >

> 
> >   <device name=""/>
> >
> You should configure a valid fencing method, and if you don't have any, use
> fence_manual until you get it.
> 
> >
> >   </method>

> >   </fence>
> >   <multicast addr="" interface=""/>
> >
> I am not sure, but I think you should erase this <multicast ....> tag
> 
> Greetings,

> Juanra
> 
> >
> >   </clusternode>
> >
> 
> >   <clusternode name="host2" nodeid="2" votes="1">
> >   <fence>

> >   <method name="1">
> >   <device name=""/>
> >   </method>
> >   <method name=""/>
> >   </fence>
> >   <multicast addr="" interface=""/>

> >   </clusternode>
> >   </clusternodes>
> >   <cman expected_votes="" two_node="">
> >   <multicast addr=""/>
> >   </cman>

> >   <fencedevices>
> >   <fencedevice agent="fence_brocade" ipaddr="" login="" name="" passwd=""/>
> >   </fencedevices>
> >   <rm>

> >   <failoverdomains>
> >   </failoverdomains>
> >   <resources>
> >   </resources>
> >   </rm>
> > </cluster>
> >
> > Regards

> >
> >
> >
> >
> > ntoughe@xxxxxxxxxxx
> >
> >
> >
> >
> > > From: linux-cluster-request@xxxxxxxxxx

> > > Subject: Linux-cluster Digest, Vol 64, Issue 16
> > > To: linux-cluster@xxxxxxxxxx
> > > Date: Thu, 13 Aug 2009 11:02:36 -0400

> > >
> > > Send Linux-cluster mailing list submissions to
> > > linux-cluster@xxxxxxxxxx
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit

> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > or, via email, send a message with subject or body 'help' to

> > > linux-cluster-request@xxxxxxxxxx
> > >
> > > You can reach the person managing the list at
> > > linux-cluster-owner@xxxxxxxxxx

> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Linux-cluster digest..."
> > >
> > >
> > > Today's Topics:

> > >
> > > 1. Re: do I have a fence DRAC device? (ESGLinux)
> > > 2. clusterservice stays in 'recovering' state (mark benschop)
> > > 3. Re: Is there any backup heartbeat channel (Hakan VELIOGLU)

> > > 4. Re: Is there any backup heartbeat channel
> > > (Juan Ramon Martin Blanco)
> > > 5. RHCS on KVM (Nehemias Jahcob)
> > > 6. Cman hang (NTOUGHE GUY-SERGE)
> > > 7. Re: gfs2 mount hangs (David Teigland)

> > > 8. Re: Qdisk question (Lon Hohberger)
> > > 9. Re: Cman hang (Juan Ramon Martin Blanco)
> > >
> > >
> > > ----------------------------------------------------------------------

> > >
> > > Message: 1
> > > Date: Thu, 13 Aug 2009 13:27:16 +0200
> > > From: ESGLinux <esggrupos@xxxxxxxxx>

> > > Subject: Re:  do I have a fence DRAC device?
> > > To: linux clustering <linux-cluster@xxxxxxxxxx>
> > > Message-ID:

> > > <3128ba140908130427i6ab85406ye6da34073e6a6e97@xxxxxxxxxxxxxx>
> > > Content-Type: text/plain; charset="iso-8859-1"

> > >
> > > Hi,
> > > I couldn´t reboot my system yet but I have installed the openmanage
> > > packages:
> > >
> > > srvadmin-omacore-5.4.0-260
> > > srvadmin-iws-5.4.0-260

> > > srvadmin-syscheck-5.4.0-260
> > > srvadmin-rac5-components-5.4.0-260
> > > srvadmin-deng-5.4.0-260
> > > srvadmin-ipmi-5.4.0-260.DUP
> > > srvadmin-racadm5-5.4.0-260

> > > srvadmin-omauth-5.4.0-260.rhel5
> > > srvadmin-hapi-5.4.0-260
> > > srvadmin-cm-5.4.0-260
> > > srvadmin-racdrsc5-5.4.0-260
> > > srvadmin-omilcore-5.4.0-260

> > > srvadmin-isvc-5.4.0-260
> > > srvadmin-storage-5.4.0-260
> > > srvadmin-jre-5.4.0-260
> > > srvadmin-omhip-5.4.0-260
> > >
> > > Now I have the command racadm but when I try to execut it I get this:

> > >
> > > racadm config -g cfgSerial -o cfgSerialTelnetEnable 1
> > > ERROR: RACADM is unable to process the requested subcommand because there
> > is
> > > no
> > > local RAC configuration to communicate with.

> > >
> > > Local RACADM subcommand execution requires the following:
> > >
> > > 1. A Remote Access Controller (RAC) must be present on the managed server
> > > 2. Appropriate managed node software must be installed and running on the

> > > server
> > >
> > >
> > > What do I need to install/start? or until I configure the bios I can´t
> > get
> > > this work?
> > >
> > > Greetings

> > >
> > > ESG
> > >
> > >
> > > 2009/8/11 <bergman@xxxxxxxxxxxx>
> > >
> > > >

> > > >
> > > > In the message dated: Tue, 11 Aug 2009 14:14:03 +0200,
> > > > The pithy ruminations from Juan Ramon Martin Blanco on
> > > > <Re:  do I have a fence DRAC device?> were:

> > > > => --===============1917368601==
> > > > => Content-Type: multipart/alternative;
> > > > boundary=0016364c7c07663f600470dca3b8
> > > > =>
> > > > => --0016364c7c07663f600470dca3b8

> > > > => Content-Type: text/plain; charset=ISO-8859-1
> > > > => Content-Transfer-Encoding: quoted-printable
> > > > =>
> > > > => On Tue, Aug 11, 2009 at 2:03 PM, ESGLinux <esggrupos@xxxxxxxxx>

> > wrote:
> > > > =>
> > > > => > Thanks
> > > > => > I=B4ll check it when I could reboot the server.
> > > > => >
> > > > => > greetings,

> > > > => >
> > > > => You have a BMC ipmi in the first network interface, it can be
> > configured
> > > > at
> > > > => boot time (I don't remember if inside the BIOS or pressing

> > > > cntrl+something
> > > > => during boot)
> > > > =>
> > > >
> > > > Based on my notes, here's how I configured the DRAC interface on a Dell

> > > > 1950
> > > > for use as a fence device:
> > > >
> > > > Configuring the card from Linux depending on the installation of
> > > > Dell's

> > > > OMSA package. Once that's installed, use the following
> > > > commands:
> > > >
> > > > racadm config -g cfgSerial -o cfgSerialTelnetEnable 1
> > > > racadm config -g cfgLanNetworking -o cfgDNSRacName

> > > > HOSTNAME_FOR_INTERFACE
> > > > racadm config -g cfgDNSDomainName DOMAINNAME_FOR_INTERFACE
> > > > racadm config -g cfgUserAdmin -o cfgUserAdminPassword -i 2
> > > > PASSWORD

> > > > racadm config -g cfgNicEnable 1
> > > > racadm config -g cfgNicIpAddress WWW.XXX.YYY.ZZZ
> > > > racadm config -g cfgNicNetmask WWW.XXX.YYY.ZZZ
> > > > racadm config -g cfgNicGateway WWW.XXX.YYY.ZZZ

> > > > racadm config -g cfgNicUseDhcp 0
> > > >
> > > >
> > > > I also save a backup of the configuration with:
> > > >
> > > > racadm getconfig -f ~/drac_config

> > > >
> > > >
> > > > Hope this helps,
> > > >
> > > > Mark
> > > >
> > > > ----
> > > > Mark Bergman voice: 215-662-7310

> > > > mark.bergman@xxxxxxxxxxxxxx fax: 215-614-0266
> > > > System Administrator Section of Biomedical Image Analysis
> > > > Department of Radiology University of Pennsylvania

> > > > PGP Key: https://www.rad.upenn.edu/sbia/bergman
> > > >
> > > >
> > > > => Greetings,

> > > > => Juanra
> > > > =>
> > > > => >
> > > > => > ESG
> > > > => >
> > > > => > 2009/8/10 Paras pradhan <pradhanparas@xxxxxxxxx>

> > > > => >
> > > > => > On Mon, Aug 10, 2009 at 5:24 AM, ESGLinux<esggrupos@xxxxxxxxx>
> > wrote:
> > > > => >> > Hi all,

> > > > => >> > I was designing a 2 node cluster and I was going to use 2
> > servers
> > > > DELL
> > > > => >> > PowerEdge 1950. I was going to buy a DRAC card to use for

> > fencing
> > > > but
> > > > => >> > running several commands in the servers I have noticed that
> > when I
> > > > run
> > > > => >> this

> > > > => >> > command:
> > > > => >> > #ipmitool lan print
> > > > => >> > Set in Progress : Set Complete
> > > > => >> > Auth Type Support : NONE MD2 MD5 PASSWORD

> > > > => >> > Auth Type Enable : Callback : MD2 MD5
> > > > => >> > : User : MD2 MD5
> > > > => >> > : Operator : MD2 MD5
> > > > => >> > : Admin : MD2 MD5

> > > > => >> > : OEM : MD2 MD5
> > > > => >> > IP Address Source : Static Address
> > > > => >> > IP Address : 0.0.0.0
> > > > => >> > Subnet Mask : 0.0.0.0

> > > > => >> > MAC Address : 00:1e:c9:ae:6f:7e
> > > > => >> > SNMP Community String : public
> > > > => >> > IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10

> > > > => >> > Default Gateway IP : 0.0.0.0
> > > > => >> > Default Gateway MAC : 00:00:00:00:00:00
> > > > => >> > Backup Gateway IP : 0.0.0.0

> > > > => >> > Backup Gateway MAC : 00:00:00:00:00:00
> > > > => >> > 802.1q VLAN ID : Disabled
> > > > => >> > 802.1q VLAN Priority : 0
> > > > => >> > RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14

> > > > => >> > Cipher Suite Priv Max : aaaaaaaaaaaaaaa
> > > > => >> > : X=Cipher Suite Unused
> > > > => >> > : c=CALLBACK
> > > > => >> > : u=USER

> > > > => >> > : o=OPERATOR
> > > > => >> > : a=ADMIN
> > > > => >> > : O=OEM
> > > > => >> > does this mean that I already have an ipmi card (not

> > configured)
> > > > that
> > > > => I
> > > > => >> can
> > > > => >> > use for fencing? if the anwser is yes, where hell must I

> > configure
> > > > it?
> > > > => I
> > > > => >> > don=B4t see wher can I do it.
> > > > => >> > If I haven=B4t a fencing device which one do you recommed to

> > use?
> > > > => >> > Thanks in advance
> > > > => >> > ESG
> > > > => >> >
> > > > => >> > --
> > > > => >> > Linux-cluster mailing list

> > > > => >> > Linux-cluster@xxxxxxxxxx
> > > > => >> > https://www.redhat.com/mailman/listinfo/linux-cluster

> > > > => >> >
> > > > => >>
> > > > => >> Yes you have IPMI and if you are using 1950 Dell, DRAC should be
> > > > there
> > > > => >> too. You can see if you have DRAC or not when the server starts

> > and
> > > > => >> before the loading of the OS.
> > > > => >>
> > > > => >> I have 1850s and I am using DRAC for fencing.
> > > > => >>

> > > > => >>
> > > > => >> Paras.
> > > > => >>
> > > > => >> --
> > > > => >> Linux-cluster mailing list

> > > > => >> Linux-cluster@xxxxxxxxxx
> > > > => >> https://www.redhat.com/mailman/listinfo/linux-cluster

> > > > => >>
> > > > => >
> > > > => >
> > > >
> > > >
> > > >
> > > > --
> > > > Linux-cluster mailing list

> > > > Linux-cluster@xxxxxxxxxx
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster

> > > >
> > > -------------- next part --------------
> > > An HTML attachment was scrubbed...
> > > URL:
> > https://www.redhat.com/archives/linux-cluster/attachments/20090813/a4558d27/attachment.html

> > >
> > > ------------------------------
> > >
> > > Message: 2
> > > Date: Thu, 13 Aug 2009 14:45:13 +0200
> > > From: mark benschop <mark.benschop.lists@xxxxxxxxx>

> > > Subject:  clusterservice stays in 'recovering' state
> > > To: linux-cluster@xxxxxxxxxx
> > > Message-ID:

> > > <f97c3a70908130545n11ce442ej17d74c9cdc450e45@xxxxxxxxxxxxxx>
> > > Content-Type: text/plain; charset="iso-8859-1"

> > >
> > > Hi All,
> > >
> > > I've a problem with a clusterservice. The service was started up while
> > one
> > > of the resources, an NFS, export was not accessible.

> > > Therefore the service never started up right but got into the
> > 'recovering'
> > > state.
> > > In the mean time the NFS exports are setup properly but to no avail.

> > > Stopping the clusterservice, using clusvcadm -d <service>, will result in
> > > the service going down but staying in the 'recovering' state.
> > > Starting it again doesn't work. The service doesn't start and stays in

> > the
> > > recovery status.
> > > I'm suspecting rgmanager lost it somehow.
> > >
> > > Anybody had any ideas on what could be the problem and how to resolve it

> > ?
> > >
> > > Thanks in advance,
> > > Mark
> > > -------------- next part --------------
> > > An HTML attachment was scrubbed...
> > > URL:

> > https://www.redhat.com/archives/linux-cluster/attachments/20090813/31731cd3/attachment.html

> > >
> > > ------------------------------
> > >
> > > Message: 3
> > > Date: Thu, 13 Aug 2009 16:13:12 +0300
> > > From: Hakan VELIOGLU <veliogluh@xxxxxxxxxx>

> > > Subject: Re:  Is there any backup heartbeat channel
> > > To: linux-cluster@xxxxxxxxxx
> > > Message-ID: <20090813161312.11546h2sp6psr814@xxxxxxxxxxxxxxxxxx>

> > > Content-Type: text/plain; charset=ISO-8859-9; DelSp="Yes";
> > > format="flowed"
> > >
> > > Thanks for all the answers.
> > >
> > > I think there is realy no backup heartbeat channel. Maybe the reason

> > > is GFS. DLM works on the heartbeat channel. If you lost your heartbeat
> > > you lose your lock consistency so it is better to fence the other
> > > node. For this reason I think if you don't have enough network

> > > interface on server and switch, loosing the heartbeat network may shut
> > > all the cluster members.
> > >
> > > Hakan VELÝOÐLU
> > >
> > >
> > > ----- robejrm@xxxxxxxxx den ileti ---------

> > > Tarih: Thu, 13 Aug 2009 10:42:11 +0200
> > > Kimden: Juan Ramon Martin Blanco <robejrm@xxxxxxxxx>
> > > Yanýt Adresi:linux clustering <linux-cluster@xxxxxxxxxx>

> > > Konu: Re:  Is there any backup heartbeat channel
> > > Kime: linux clustering <linux-cluster@xxxxxxxxxx>
> > >

> > >
> > > > 2009/8/13 Hakan VELIOGLU <veliogluh@xxxxxxxxxx>
> > > >
> > > >> ----- raju.rajsand@xxxxxxxxx den ileti ---------

> > > >> Tarih: Thu, 13 Aug 2009 08:57:15 +0530
> > > >> Kimden: Rajagopal Swaminathan <raju.rajsand@xxxxxxxxx>
> > > >> Yanýt Adresi:linux clustering <linux-cluster@xxxxxxxxxx>

> > > >> Konu: Re:  Is there any backup heartbeat channel
> > > >> Kime: linux clustering <linux-cluster@xxxxxxxxxx>

> > > >>
> > > >>
> > > >> Greetings,
> > > >>>
> > > >>> 2009/8/12 Hakan VELIOGLU <veliogluh@xxxxxxxxxx>:

> > > >>>
> > > >>>> Hi list,
> > > >>>>
> > > >>>> I am trying a two node cluster with RH 5.3 on Sun X4150 hardware. I
> > use a

> > > >>>>
> > > >>>
> > > >>> IIRC, Sun x4150 has four ethernet ports. Two can be used for outside
> > > >>> networking and two can be bonded and used for heartbeat.

> > > >>>
> > > >> I think, I couldn't explain my networking. I use two ethernet ports
> > for xen
> > > >> vm which are trunk and bonded ports. Then there left two. Our network

> > > >> topology (which is out of my control) available for one port for
> > server
> > > >> control (SSH).
> > > >
> > > > So you can't use a bonded port for both server management and cluster

> > > > communications, can you? You can configure an active-passive bonding
> > and
> > > > then you can have many virtual interfaces on top of that, i.e: bond0:0,
> > > > bond0:1 and assign them the ip addesses you need.

> > > >
> > > >
> > > > I use the other one with a cross over cable for heartbeat. So there is
> > no
> > > >> way for bonding these two interfaces. Of course if I buy an extra

> > switch I
> > > >> may do this.
> > > >
> > > > You can connect them to the same switch (though you lost kind of
> > > > redundancy), or you can use two crossover cables and move the

> > management IP
> > > > to the same ports you are using for the vm's.
> > > >
> > > > Greetings,
> > > > Juanra
> > > >
> > > >>

> > > >> I don't realy understand why there is no backup heartbeat channel. LVS
> > and
> > > >> MS cluster has this ability.
> > > >>
> > > >>>

> > > >>> ALOM can be used for fencing and can be on a seperate subnet if
> > required.
> > > >>>
> > > >> I used this for fencing_ipmilan.
> > > >>

> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Rajagopal
> > > >>>
> > > >>> --
> > > >>> Linux-cluster mailing list

> > > >>> Linux-cluster@xxxxxxxxxx
> > > >>> https://www.redhat.com/mailman/listinfo/linux-cluster

> > > >>>
> > > >>>
> > > >>
> > > >> ----- raju.rajsand@xxxxxxxxx den iletiyi bitir -----

> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Linux-cluster mailing list
> > > >> Linux-cluster@xxxxxxxxxx

> > > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >>
> > > >
> > >

> > >
> > > ----- robejrm@xxxxxxxxx den iletiyi bitir -----
> > >
> > >
> > >
> > >
> > >

> > > ------------------------------
> > >
> > > Message: 4
> > > Date: Thu, 13 Aug 2009 15:29:43 +0200
> > > From: Juan Ramon Martin Blanco <robejrm@xxxxxxxxx>

> > > Subject: Re:  Is there any backup heartbeat channel
> > > To: linux clustering <linux-cluster@xxxxxxxxxx>
> > > Message-ID:

> > > <8a5668960908130629n6ec05a88n463a3b03da331dae@xxxxxxxxxxxxxx>
> > > Content-Type: text/plain; charset="iso-8859-9"

> > >
> > > 2009/8/13 Hakan VELIOGLU <veliogluh@xxxxxxxxxx>
> > >
> > > > Thanks for all the answers.
> > > >

> > > > I think there is realy no backup heartbeat channel. Maybe the reason is
> > > > GFS. DLM works on the heartbeat channel. If you lost your heartbeat you
> > lose
> > > > your lock consistency so it is better to fence the other node. For this

> > > > reason I think if you don't have enough network interface on server and
> > > > switch, loosing the heartbeat network may shut all the cluster members.
> > > >
> > > There is no backup heartbeat channel because you should do the backup at

> > a
> > > operating system level, i.e: bonding
> > > That's why you should use a bonded interface for the heartbeat channel
> > with
> > > at least 2 ethernet slaves; going further (for better redundancy) each of

> > > the slaves should be on a different network card and you should connect
> > the
> > > each slave to a different switch.
> > > But what I am trying to explain, is that you can use that bonded logical

> > > interface also for things different from hearbeat. ;)
> > >
> > > Greetings,
> > > Juanra
> > >
> > >
> > > > Hakan VELÝOÐLU
> > > >

> > > >
> > > > ----- robejrm@xxxxxxxxx den ileti ---------
> > > > Tarih: Thu, 13 Aug 2009 10:42:11 +0200
> > > > Kimden: Juan Ramon Martin Blanco <robejrm@xxxxxxxxx>

> > > >
> > > > Yanýt Adresi:linux clustering <linux-cluster@xxxxxxxxxx>
> > > > Konu: Re:  Is there any backup heartbeat channel

> > > > Kime: linux clustering <linux-cluster@xxxxxxxxxx>
> > > >
> > > >
> > > > 2009/8/13 Hakan VELIOGLU <veliogluh@xxxxxxxxxx>

> > > >>
> > > >> ----- raju.rajsand@xxxxxxxxx den ileti ---------
> > > >>> Tarih: Thu, 13 Aug 2009 08:57:15 +0530

> > > >>> Kimden: Rajagopal Swaminathan <raju.rajsand@xxxxxxxxx>
> > > >>> Yanýt Adresi:linux clustering <linux-cluster@xxxxxxxxxx>

> > > >>> Konu: Re:  Is there any backup heartbeat channel
> > > >>> Kime: linux clustering <linux-cluster@xxxxxxxxxx>

> > > >>>
> > > >>>
> > > >>> Greetings,
> > > >>>
> > > >>>>
> > > >>>> 2009/8/12 Hakan VELIOGLU <veliogluh@xxxxxxxxxx>:

> > > >>>>
> > > >>>> Hi list,
> > > >>>>>
> > > >>>>> I am trying a two node cluster with RH 5.3 on Sun X4150 hardware. I

> > use
> > > >>>>> a
> > > >>>>>
> > > >>>>>
> > > >>>> IIRC, Sun x4150 has four ethernet ports. Two can be used for outside

> > > >>>> networking and two can be bonded and used for heartbeat.
> > > >>>>
> > > >>>> I think, I couldn't explain my networking. I use two ethernet ports

> > for
> > > >>> xen
> > > >>> vm which are trunk and bonded ports. Then there left two. Our network
> > > >>> topology (which is out of my control) available for one port for

> > server
> > > >>> control (SSH).
> > > >>>
> > > >>
> > > >> So you can't use a bonded port for both server management and cluster

> > > >> communications, can you? You can configure an active-passive bonding
> > and
> > > >> then you can have many virtual interfaces on top of that, i.e:
> > bond0:0,

> > > >> bond0:1 and assign them the ip addesses you need.
> > > >>
> > > >>
> > > >> I use the other one with a cross over cable for heartbeat. So there is

> > no
> > > >>
> > > >>> way for bonding these two interfaces. Of course if I buy an extra
> > switch
> > > >>> I
> > > >>> may do this.

> > > >>>
> > > >>
> > > >> You can connect them to the same switch (though you lost kind of
> > > >> redundancy), or you can use two crossover cables and move the

> > management
> > > >> IP
> > > >> to the same ports you are using for the vm's.
> > > >>
> > > >> Greetings,
> > > >> Juanra

> > > >>
> > > >>
> > > >>> I don't realy understand why there is no backup heartbeat channel.
> > LVS
> > > >>> and
> > > >>> MS cluster has this ability.

> > > >>>
> > > >>>
> > > >>>> ALOM can be used for fencing and can be on a seperate subnet if
> > > >>>> required.
> > > >>>>

> > > >>>> I used this for fencing_ipmilan.
> > > >>>
> > > >>>
> > > >>>> Regards
> > > >>>>
> > > >>>> Rajagopal

> > > >>>>
> > > >>>> --
> > > >>>> Linux-cluster mailing list
> > > >>>> Linux-cluster@xxxxxxxxxx

> > > >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >>>>
> > > >>>>

> > > >>>>
> > > >>> ----- raju.rajsand@xxxxxxxxx den iletiyi bitir -----
> > > >>>
> > > >>>

> > > >>>
> > > >>>
> > > >>> --
> > > >>> Linux-cluster mailing list
> > > >>> Linux-cluster@xxxxxxxxxx

> > > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >>>
> > > >>>

> > > >>
> > > >
> > > > ----- robejrm@xxxxxxxxx den iletiyi bitir -----
> > > >
> > > >

> > > >
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster@xxxxxxxxxx

> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >
> > > -------------- next part --------------

> > > An HTML attachment was scrubbed...
> > > URL:
> > https://www.redhat.com/archives/linux-cluster/attachments/20090813/cd4dc079/attachment.html

> > >
> > > ------------------------------
> > >
> > > Message: 5
> > > Date: Thu, 13 Aug 2009 10:07:13 -0400
> > > From: Nehemias Jahcob <nehemiasjahcob@xxxxxxxxx>

> > > Subject:  RHCS on KVM
> > > To: linux clustering <linux-cluster@xxxxxxxxxx>
> > > Message-ID:
> > > <5f61ab380908130707q5c936504k7351d0d6b3459090@xxxxxxxxxxxxxx>

> > > Content-Type: text/plain; charset="iso-8859-1"
> > >
> > > Hi.
> > >
> > > How to create a cluster of 2 nodes in rhel5.4 (or Fedora 10) with KVM?

> > >
> > > With XEN follow this guide:
> > > http://sources.redhat.com/cluster/wiki/VMClusterCookbook?highlight =

> > > (CategoryHowTo).
> > >
> > > Do you have a guide to implementation of RHCS in KVM?
> > >
> > > Thank you all.
> > > NJ
> > > -------------- next part --------------

> > > An HTML attachment was scrubbed...
> > > URL:
> > https://www.redhat.com/archives/linux-cluster/attachments/20090813/f3a69a80/attachment.html

> > >
> > > ------------------------------
> > >
> > > Message: 6
> > > Date: Thu, 13 Aug 2009 14:16:47 +0000
> > > From: NTOUGHE GUY-SERGE <ntoughe@xxxxxxxxxxx>

> > > Subject:  Cman hang
> > > To: <linux-cluster@xxxxxxxxxx>
> > > Message-ID: <BAY119-W410E2F250E8B461752CFC9A5050@xxxxxxx>

> > > Content-Type: text/plain; charset="iso-8859-1"
> > >
> > >
> > >
> > > Hi gurus,
> > >
> > > i installed RHEL 5.3 on 2 servers which participating to a cluster

> > composed of these 2 nodes:
> > > kernel version:
> > > kernel-headers-2.6.18-128.el5
> > > kernel-devel-2.6.18-128.el5
> > > kernel-2.6.18-128.el5
> > > cman-devel-2.0.98-1.el5_3.1

> > > cman-2.0.98-1.el5_3.1
> > > cluster-cim-0.12.1-2.el5
> > > lvm2-cluster-2.02.40-7.el5
> > > cluster-snmp-0.12.1-2.el5
> > > modcluster-0.12.1-2.el5
> > > When i want to start cman the following message is sent:

> > > cman not started: Multicast and node address families differ.
> > /usr/sbin/cman_tool: aisexec daemon didn't start
> > > [FAILED]
> > >
> > > I trier to mount gfs2

> > > and i got theses messages:
> > > # mount -t gfs2 /dev/VolGroup01/LogVol01 /appli/prod --o
> > lockTablename=arvclust:/appli/prod, Lockproto=lock_dlm
> > >
> > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused

> > >
> > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused
> > >
> > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused
> > >

> > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused
> > >
> > > do you have any clues?
> > > Please it's an hurry, i waste long time to lok for solution help

> > > regards
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ntoughe@xxxxxxxxxxx

> > >
> > >
> > > _________________________________________________________________
> > > With Windows Live, you can organize, edit, and share your photos.
> > >

> > http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx

> > > -------------- next part --------------
> > > An HTML attachment was scrubbed...
> > > URL:
> > https://www.redhat.com/archives/linux-cluster/attachments/20090813/0a55101d/attachment.html

> > >
> > > ------------------------------
> > >
> > > Message: 7
> > > Date: Thu, 13 Aug 2009 09:14:24 -0500
> > > From: David Teigland <teigland@xxxxxxxxxx>

> > > Subject: Re:  gfs2 mount hangs
> > > To: Wengang Wang <wen.gang.wang@xxxxxxxxxx>
> > > Cc: linux clustering <linux-cluster@xxxxxxxxxx>

> > > Message-ID: <20090813141424.GA8148@xxxxxxxxxx>
> > > Content-Type: text/plain; charset=us-ascii
> > >
> > > On Thu, Aug 13, 2009 at 02:22:11PM +0800, Wengang Wang wrote:

> > > > <cman two_node="1" expected_votes="2"/>
> > >
> > > That's not a valid combination, two_node="1" requires expected_votes="1".

> > >
> > > You didn't mention which userspace cluster version/release you're using,
> > or
> > > include any status about the cluster. Before trying to mount gfs on
> > either

> > > node, collect from both nodes:
> > >
> > > cman_tool status
> > > cman_tool nodes
> > > group_tool
> > >
> > > Then mount on the first node and collect the same information, then try

> > > mounting on the second node, collect the same information, and look for
> > any
> > > errors in /var/log/messages.
> > >
> > > Since you're using new kernels, you need to be using the cluster 3.0

> > userspace
> > > code. You're using the old manual fencing config. There is no more
> > > fence_manual; the new way to configure manual fencing is to not configure
> > any

> > > fencing at all. So, your cluster.conf should look like this:
> > >
> > > <?xml version="1.0"?>
> > > <cluster name="testgfs2" config_version="1">

> > > <cman two_node="1" expected_votes="1"/>
> > > <clusternodes>
> > > <clusternode name="cool" nodeid="1"/>
> > > <clusternode name="desk" nodeid="2"/>

> > > </clusternodes>
> > > </cluster>
> > >
> > > Dave
> > >
> > >
> > >
> > > ------------------------------
> > >

> > > Message: 8
> > > Date: Thu, 13 Aug 2009 10:39:46 -0400
> > > From: Lon Hohberger <lhh@xxxxxxxxxx>
> > > Subject: Re:  Qdisk question

> > > To: linux clustering <linux-cluster@xxxxxxxxxx>
> > > Message-ID: <1250174386.23376.1440.camel@xxxxxxxxxxxxxxxxxxxxx>

> > > Content-Type: text/plain
> > >
> > > On Thu, 2009-08-13 at 00:45 +0200, brem belguebli wrote:
> > >
> > > > My understanding of qdisk is that it is used as a tie-breaker, but it

> > > > looks like it is more a heatbeat vector than a simple tie-breaker.
> > >
> > > Right, it's a secondary membership algorithm.
> > >
> > >
> > > > Until here, no real problem indeed, if the site gets apart from the

> > > > other prod site and also from the third site (hosting the iscsi target
> > > > qdisk) the 2 nodes from the failing site get evicted from the cluster.
> > > >
> > > >

> > > > But, what if my third site gets isolated while the 2 prod ones are
> > > > fine ?
> > >
> > > Qdisk votes will not be presented to CMAN any more, but the two sites

> > > should remain online if they still have a "majority" of votes.
> > >
> > >
> > > > The real question is what happens in case all the nodes loose access
> > > > to the qdisk while they're still able to see each others ?

> > >
> > > Qdisk is just a vote like other voting mechanisms. If all nodes lose
> > > access at the same time, it should behave like a node death. However,
> > > the default action if _one_ node loses access is to kill that node (even

> > > if CMAN still sees it).
> > >
> > >
> > > > The 4 nodes have each 1 vote and the qdisk 1 vote. The expected quorum
> > > > is 3.
> > >
> > >

> > > > If I loose the qdisk, the number of votes falls to 4, the cluster is
> > > > quorate (4>3) but it looks like everything goes bad, each node
> > > > deactivate itself as it can't write its alive status (--> heartbeat

> > > > vector) to the qdisk even if the network heartbeating is working
> > > > fine.
> > >
> > > What happens specifically? Most of the actions qdiskd performs are
> > > configurable. For example, if the nodes are rebooting, you can turn

> > > that behavior off.
> > >
> > >
> > >
> > > I wrote a simple 'ping' tiebreaker based the behaviors in RHEL3. It
> > > functions in many ways in the same manner as qdiskd with respect to vote

> > > advertisement to CMAN, but without needing a disk - maybe you would find
> > > it useful?
> > >
> > > http://people.redhat.com/lhh/qnet.tar.gz

> > >
> > > -- Lon
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 9
> > > Date: Thu, 13 Aug 2009 17:02:15 +0200

> > > From: Juan Ramon Martin Blanco <robejrm@xxxxxxxxx>
> > > Subject: Re:  Cman hang
> > > To: linux clustering <linux-cluster@xxxxxxxxxx>

> > > Message-ID:
> > > <8a5668960908130802p4f5168cbueda86d1e6f1324bb@xxxxxxxxxxxxxx>
> > > Content-Type: text/plain; charset="iso-8859-1"

> > >
> > > On Thu, Aug 13, 2009 at 4:16 PM, NTOUGHE GUY-SERGE <ntoughe@xxxxxxxxxxx
> > >wrote:
> > >
> > > >

> > > > Hi gurus,
> > > >
> > > > i installed RHEL 5.3 on 2 servers which participating to a cluster
> > > > composed of these 2 nodes:
> > > > kernel version:

> > > > kernel-headers-2.6.18-128.el5
> > > > kernel-devel-2.6.18-128.el5
> > > > kernel-2.6.18-128.el5
> > > > cman-devel-2.0.98-1.el5_3.1
> > > > cman-2.0.98-1.el5_3.1

> > > > cluster-cim-0.12.1-2.el5
> > > > lvm2-cluster-2.02.40-7.el5
> > > > cluster-snmp-0.12.1-2.el5
> > > > modcluster-0.12.1-2.el5
> > > > When i want to start cman the following message is sent:

> > > > cman not started: Multicast and node address families differ.
> > > > /usr/sbin/cman_tool: aisexec daemon didn't start
> > > > [FAILED]
> > > >
> > > Please, show us your cluster.conf file so we can help.

> > >
> > > Regards,
> > > Juanra
> > >
> > > >
> > > > I trier to mount gfs2
> > > > and i got theses messages:
> > > > # mount -t gfs2 /dev/VolGroup01/LogVol01 /appli/prod --o

> > > > lockTablename=arvclust:/appli/prod, Lockproto=lock_dlm
> > > >
> > > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused
> > > >
> > > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused

> > > >
> > > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused
> > > >
> > > > /sbin/mount.gfs2: can't connect to gfs_controld: Connection refused

> > > >
> > > > do you have any clues?
> > > > Please it's an hurry, i waste long time to lok for solution help
> > > > regards
> > > >
> > > >

> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ntoughe@xxxxxxxxxxx

> > > >
> > > >
> > > >
> > > > ------------------------------
> > > > With Windows Live, you can organize, edit, and share your photos.<
> > http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx

> > >
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster@xxxxxxxxxx

> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >
> > > -------------- next part --------------

> > > An HTML attachment was scrubbed...
> > > URL:
> > https://www.redhat.com/archives/linux-cluster/attachments/20090813/9ecbcab1/attachment.html

> > >
> > > ------------------------------
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster@xxxxxxxxxx

> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > > End of Linux-cluster Digest, Vol 64, Issue 16

> > > *********************************************
> >
> > ------------------------------
> > See all the ways you can stay connected to friends and family<http://www.microsoft.com/windows/windowslive/default.aspx>

> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster

> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://www.redhat.com/archives/linux-cluster/attachments/20090813/f9557411/attachment.html

> 
> ------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

> 
> End of Linux-cluster Digest, Vol 64, Issue 18
> *********************************************

Share your memories online with anyone you want anyone you want.

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster